Finite-state Recognizers 1 Zvi Kohavi and Niraj K. Jha Deterministic Recognizers Treat FSM as a recognizer that classifies input strings into two classes: strings it accepts and strings it rejects Tape 1 0 0 1 0 0 1 1 Head Finite control Finite-state recognizer: • Equivalent to a string of input symbols that enter the machine at successive times • Finite-state control: Moore FSM • States in which output symbol is 1 (0): accepting (rejecting) states • A string is accepted by an FSM: if and only if the state the FSM enters after having read the rightmost symbol is an accepting state • Set of strings recognized by an FSM: all input strings that take the FSM from its starting state to an accepting state 2 Transition Graph Example: a machine that accepts a string if and only if the string begins and ends with a 1, and every 0 in the string is preceded and followed by at least a single 1 1 A 1 A 1 0 0 0 0,1 1 C B (a) Deterministic state diagram. B (b) Transition graph. Transition graph: consists of a set of vertices and various directed arcs connecting them • • • • At least one of the vertices is specified as a starting vertex Arcs are labeled with symbols from the input alphabet A vertex may have one or more Ii-successors or none It accepts a string if the string is described by at least one path emanating from a starting vertex and terminating at an accepting vertex • It may be deterministic or non-deterministic 3 Example Example: 1110 and 11011 accepted by the transition graph below, but 100 rejected A D 0 1 1 1 1 0 B 0 C Equivalent transition graphs: two or more graphs that recognize the same set of strings • Each graph below accepts a string: if and only if each 1 is preceded by at least two 0’s 0 0 0 A 1 B 0 C 0 A B 0 C 1 4 Graphs Containing -Transitions -transitions: when no input symbol is used to make the transition Example: Graph that recognizes a set of strings that start with an even number of 1’s, followed by an even number of 0’s, and end with substring 101 B D 0 1 1 0 1 C A E 0 F 1 G (a) A graph containing a -transition. B D 1 0 0 1 A 0 C 1 E 0 F 1 G 1 (b) An equivalent graph without -transitions. 5 Converting Nondeterministic into Deterministic Graphs Example: Transition graph and its transition table 0,1 A 0 1 0 C 0 C 1 A B AC C AB A 1 B (a) Transition graph. • (b) Transition table. Successor table and deterministic graph: 0 AB AB 0 1 C AC C 0 1 1 C AB A AC ABC A A C 0 1 AC A 1 0 ABC ABC AC ABC 0 (a) Successor table. 0,1 (b) State diagram of an equivalent deterministic machine. 6 Theorem Theorem: Let S be a set of strings that can be recognized by a nondeterministic transition graph Gn. Then S can also be recognized by an equivalent deterministic graph Gd. Moreover, if Gn has p vertices, Gd will have at most 2p vertices 7 Regular Expressions Example: Sets of strings and the corresponding expression • • • • Graph (a) recognizes set {101}: expression denoted as 101 Graph (b) recognizes set {01,10}: expression = 01 + 10 Graph (c) recognizes {0111,1011}: expression = 0111 + 1011 – Concatenation of 01 + 10 and 11 Graph (d) recognizes set { ,1,11,111,1111,…}: expression = 1* 1 0 1 (a) 0 1 1 0 1 0 1 0 (b) 1 1 1 (d) (c) 8 Regular Expressions (Contd.) Example: 01(01)* = 01 + 0101 + 010101 + 01010101 + … • R* = + R + R2 + R3 + … Example: Set of strings on {0,1} beginning with a 0 and followed only by 1’s: 01* Example: Set of strings on {0,1} containing exactly two 1’s: 0*10*10* Example: Set of all strings on {0,1}: (0+1)* = + 0 + 1 + 00 + 01 + 10 + 11 + 000 + … Example: Set of strings on {0,1} that begin with substring 11: 11(0+1)* Example: Transition graphs and the sets of strings they recognize B 1 0 0 1 A D 1 1 E A B 1 0 C 9 (a) (01 + 10)*11. (b) (10*)*. Definition and Basic Properties Let A = {a1,a2,…,ap} be a finite alphabet: then the class of regular expressions over alphabet A is defined recursively as follows: • Any symbol, a1, a2, …, ap alone is a regular expression: as are null string and empty set • If P and Q are regular expressions: then so is their concatenation PQ and their union P+Q – If P is a regular expression: then so is its closure P* • No other expressions are regular: unless they can be generated in a finite number of applications of the above rules Recognizers for and : A (a) A graph accepting . B (b) A graph accepting . 10 Identities +R=R R = R = R = R = R * = * = Set of strings that can be described by a regular expression: regular set • Not every set of strings is regular • Set over {0,1}, which consists of k 0’s (for all k), followed by a 1, followed in turn by k 0’s, is not regular: 010 + 00100 + 0001000 + … + 0k10k + … – Requires an infinite number of applications of the union operation • However, certain infinite sums are regular – Set consisting of alternating 0’s and 1’s, starting and ending with a 1: 1(01)* 11 Manipulating Regular Expressions A regular set may be described by more than one regular expression • Such expressions are called equivalent Example: Alternating 0’s and 1’s, starting and ending with 1 • 1(01)* or (10)*1 Let P, Q, and R be regular expressions: then R+R=R PQ + PR = P(Q+R); PQ + RQ = (P + R)Q R*R* = R* RR* = R*R (R*)* = R* + RR* = R* (PQ)*P = P(QP)* (P + Q)* = (P*Q*)* = (P* + Q*)* = P*(QP*)* = (P*Q)*P* + (P + Q)*Q = (P*Q)* 12 Examples Example: Prove that the set of strings in which every 0 is immediately followed by at least two 1’s can be described by both R1 and R2, where R1 = + 1*(011)*(1*(011)*)* R2 = (1 + 011)* Proof: R1 = + 1*(011)*(1*(011)*)* = (1*(011)*)* = (1 + 011)* = R2 Example: Prove the identity (1 + 00*1) + (1 + 00*1)(0 +10*1)*(0 + 10*1) = 0*1(0 + 10*1)* Proof: LHS = (1 + 00*1)[+ (0 + 10*1)*(0 + 10*1)] = [(+ 00*)1][+ (0 + 10*1)*(0 + 10*1)] = 0*1(0 + 10*1)* 13 Transition Graphs Recognizing Regular Sets Theorem: Every regular expression R can be recognized by a transition graph i Proof: (a) R = . (b) R = . (c) R = G G H H (a) Graphs recognizing P and Q. (b) A graph recognizing P+Q. G i. H (c) A graph recognizing PQ. G 14 (d) A graph recognizing P*. Example Example: Construct a transition graph recognizing R = (0 + 1(01)*)* 0 P A A B B C T 1 C D (a) R = P*; P = 0 + 1(01)*. (c) Q = 1T; T = (01)*. 0 A B 0 C A B C 1 Q (b) P = 0 + Q; Q = 1(01)*. 1 D E (d) Final step. F 0 15 Example (Contd.) Example: Prove that (P + Q)* = P*(QP*)* P,Q P Q Q Q P P (a) Graph recognizing P*(QP*)*. P P (b) Equivalent graph with no -transitions. P,Q (a) Equivalent deterministic graph recognizing (P + Q)*. 16 Informal Techniques Example: Construct a graph that recognizes P = (01 + (11 + 0)1*0)*11 1 Graph for Q = (11 + 0)1*0 B 1 1 A C 0 0 D Graph for P 0 D 1 A 1 1 B 1 1 0 C 0 E 1 F 17 Example Example: Construct a graph that recognizes R = (1(00)*1 + 01*0)* 1 1 0 0 B 0 C B 0 C 0 A 0 A 1 1 F D 1 F D 1 0 1 1 0 0 E (a) Partial graph. 0 E (b) Complete graph. 18 Regular Sets Corresponding to Transition Graphs The set of strings that can be recognized by a transition graph (hence, an FSM) is a regular set Theorem: Let Q, P, and R be regular expressions on a finite alphabet. Then, if P does not contain : • Equation R = Q + RP has a unique solution given by R = QP* • Equation R = Q + PR has a unique solution given by R = P*Q 19 Systems of Equations Example: Derive the set of strings derived by the following transition graph 0 1 A 1 0 C B A = + A0 + B1 (1) B = A0 + B1 + C0 (2) C = B0 (3) Substituting (3) into (2): B = A0 + B1 + B00 = A0 + B(1 + 00) (4) From the theorem: B = A0(1 + 00)* (5) Substituting (5) into (1): A = + A0 + A0(1 + 00)*1 = + A(0 + 0(1 + 00)*1) From the theorem: A = (0 + 0(1 + 00)*1)* = (0 + 0(1 + 00)*1)* (7) Hence, solution C from (7), (5) and (3): C = (0 + 0(1 + 00)*1)*0(1 + 00)*0 0 0 (6) 20 Theorem Theorem: The set of strings that take an FSM M from an arbitrary state Si to another state Sj is a regular set • Combining the two theorems: – An FSM recognizes a set of strings if and only if it is a regular set Applications: the correspondence between regular sets and FSMs enables us to determine whether certain sets are regular Example: Let R denote a regular set on alphabet A that can be recognized by machine M1 • Complement R’: set containing all the strings on A that are not contained in R • R’ describes a regular set: since it can be recognized by a machine M2, which is obtained from M1 by complementing the output values associated with the states of M1 21 Examples Example: Let P&Q represent the intersection of sets P and Q • • Prove P&Q is regular Since P’ and Q’ are regular: – P’ + Q’ is regular – Hence, (P’ + Q’)’ is regular – Since P&Q = (P’ + Q’)’: P&Q is regular Regular expressions containing complementation, intersection, union, concatenation, closure: extended regular expressions Example: Consider the set of strings on {0,1} s.t. no string in the set contains three consecutive 0’s • • Set can be described by: [(0 + 1)*000(0 + 1)*]’ More complicated expression if complementation not used: (1 + 01 + 001)*( + 0 + 00) 22 Example Example: Let M be an FSM whose input/output alphabet is {0,1}. Assume the machine has a designated starting state. Let z1z2…zn denote the output sequence produced by M in response to input sequence x1x2…xn. Define a set SM, which consists of all the strings w s.t. w = z1x1z2x2…znxn for any x1x2…xn in (0 + 1)*. Prove that SM is regular. • • • Given the state diagram of M: replace each directed arc with two directed arcs and a new state, as shown in the figure Retain the original starting state: designate all the original states as accepting states The resulting nondeterministic graph recognizes SM: thus SM is regular Replace A x/z B with A z x B 23 Example (Contd.) Example (contd.): Derive SN for machine N shown below 1/0 0/1 A 0/0 B 1/1 1 E A 0 1 1 1 C DF 0 A B 0 D 1 0 0 0,1 0 1 0 1 F (a) Transition graph. 0 CE 1 B (b) Equivalent deterministic form. 24 Two-way Recognizers Two-way recognizer (or two-way machine): consists of a finite-state control coupled through a head to a tape • • • • • • • Initially: the finite-state control is in its designated starting state, with its head scanning the leftmost square of the tape The machine then proceeds to read the symbols of the tape: one at a time In each cycle of computation: the machine examines the symbol currently scanned by the head, shifts the head one square to the right or left, and then enters a new (not necessarily distinct) state If the machine eventually moves off the tape on the right end entering an accepting state: the tape is accepted by the machine A machine can reject a tape: either by moving off its right end while entering a rejecting state or by looping within the tape Null string can be represented either by: the absence of an input tape or by a completely blank tape A machine accepts if and only if: its starting state is an accepting state 25 Example Example: A two-way machine recognizing set 100* c 0 c 1 1 A A A A B D D D D B A B (a) A loop. (b) Rejection of a tape. 26 Convenience of Using Two-way Machines Two-way machines are as powerful as one-way machines w.r.t the class of tapes they can recognize • However, for some computations: it is convenient to use two-way machines since they may require fewer states Example: Consider the two-way machine shown in the table, which accepts a tape if and only if it contains at least three 1’s and at least two 0’s • The minimal one-way machine that is equivalent to the two-way machine has 12 states: since it must examine the tapes for the appropriate number of 0’s and 1’s simultaneously c 1 0 0 1 0 0 A A B B B C C (a) Rejecting a tape. C c 1 0 1 1 A A B B C D D D D E E F F 0 0 F G (b) Accepting a tape. 27 G