Module 12 • Computation and Configurations – Formal Definition – Examples 1 Definitions • Configuration – Functional Definition • Given the original program and the current configuration of a computation, someone should be able to complete the computation – Contents of a configuration for a C++ program • current instruction to be executed • current value of all variables • Computation – Complete sequence of configurations 2 Computation 1 1 int main(int x,y) { 2 int r = x % y; 3 if (r== 0) goto 8; 4 x = y; 5 y = r; 6 r = x % y; 7 goto 3; 8 return y; } Input: 10 3 • • • • • • • • • • Line 1, x=?,y=?,r=? Line 2, x=10, y=3,r=? Line 3, x=10, y=3, r=1 Line 4, x=10, y=3, r=1 Line 5, x= 3, y=3, r=1 Line 6, x=3, y=1, r=1 Line 7, x=3, y=1, r=0 Line 3, x=3, y=1, r=0 Line 8, x=3, y=1, r=0 Output is 1 3 Computation 2 int main(int x,y) { 2 int r = x % y; 3 if (r== 0) goto 8; 4 x = y; 5 y = r; 6 r = x % y; 7 goto 3; 8 return y; } Input: 53 10 • • • • • • • • • Line 1, x=?,y=?,r=? Line 2, x=53, y=10, r=? Line 3, x= 53, y=10, r=3 Line 4, x=53, y=10, r=3 Line 5, x=10, y=10, r=3 Line 6, x=10, y=3, r=3 Line 7, x=10, y=3, r=1 Line 3, x=10, y=3, r=1 ... 4 Computations 1 and 2 • • • • • • • • • • Line 1, x=?,y=?,r=? Line 2, x=10, y=3,r=? Line 3, x=10, y=3, r=1 Line 4, x=10, y=3, r=1 Line 5, x= 3, y=3, r=1 Line 6, x=3, y=1, r=1 Line 7, x=3, y=1, r=0 Line 3, x=3, y=1, r=0 Line 8, x=3, y=1, r=0 Output is 1 • • • • • • • • • Line 1, x=?,y=?,r=? Line 2, x=53, y=10, r=? Line 3, x= 53, y=10, r=3 Line 4, x=53, y=10, r=3 Line 5, x=10, y=10, r=3 Line 6, x=10, y=3, r=3 Line 7, x=10, y=3, r=1 Line 3, x=10, y=3, r=1 ... 5 Observation int main(int x,y) { 2 int r = x % y; 3 if (r== 0) goto 8; 4 x = y; 5 y = r; 6 r = x % y; 7 goto 3; 8 return y; } • Program and current configuration – Together, these two pieces of information are enough to complete the computation – Are they enough to determine what the original input was? • No! • Both previous inputs, 10 3 as well as 53 10 eventually reached the same configuration (Line 3, x=10, y=3, r=1) • Line 3, x= 10, y=3, r=1 6 Module 13 • Studying the internal structure of REC, the set of solvable problems – Complexity theory overview – Automata theory preview • Motivating Problem – string searching 7 Studying REC Complexity Theory Automata Theory 8 Current picture of all languages All Languages REC RE-REC Solvable Half Solvable All languages - RE Not even half solvable Which language class should be studied further? 9 Complexity Theory REC RE - All languages REC - RE • In complexity theory, we differentiate problems by how hard a problem is to solve – Remember, all problems in REC are solvable • Which problem is harder and why? – Max: • Input: list of n numbers • Task: return largest of the n numbers – Element • Input: list of n numbers • Task: return any of the n numbers 10 Resource Usage * • How do we formally measure the hardness of a problem? – We measure the resources required to solve input instances of the problem • Typical resources are? • Need a notion of size of an input instance – Obviously larger input instances require more resources to solve 11 Poly Language Class * Poly REC RE - All languages REC - RE Rest of REC Informal Definition: A problem L1 is easier than problem L2 if problem L1 can be solved in less time than problem L2. Poly: the set of problems which can be solved in polynomial time (typically referred to as P, not Poly) Major goal: Identify whether or not a problem belongs to Poly 12 Working with Poly Poly Rest of REC • How do you prove a problem L is in Poly? • How do you prove a problem L is not in Poly? – We are not very good at this. – For a large class of interesting problems, we have techniques (polynomial-time answerpreserving input transformations) that show a problem L probably is not in Poly, but few which prove it. 13 Examples • Shortest Path Problem • Input – Graph G – nodes s and t • Task – Find length of shortest path from s to t in G Poly Rest of REC • Longest Path Problem • Input – Graph G – nodes s and t • Task – Find length of longest path from s to t in G Which problem is provably solvable in polynomial time? 14 Automata Theory REC RE - All languages REC - RE • In automata theory, we will define new models of computation which we call automata – Finite State Automata (FSA) – Pushdown Automata (PDA) • Key concept – FSA’s and PDA’s are restricted models of computation • FSA’s and PDA’s cannot solve all the problems that C++ programs can – We then identify which problems can be solved using FSA’s and PDA’s 15 New language classes REC RE - All languages REC - RE • REC is the set of solvable languages when we start with a general model of computation like C++ programs • We want to identify which problems in REC can be solved when using these restricted automata Solvable Solvable by FSA’s by and PDA’s PDA’s Rest of REC 16 Recap * • Complexity Theory – Studies structure of the set of solvable problems – Method: analyze resources (processing time) used to solve a problem • Automata Theory – Studies structure of the set of solvable problems – Method: define automata with restricted capabilities and resources and see what they can solve (and what they cannot solve) – This theory also has important implications in the development of programming languages and compilers 17 Motivating Problem String Searching 18 String Searching • Input – String x – String y • Tasks – Return location of y in string x – Does string y occur in string x? • Can you identify applications of this type of problem in real life? • Try and develop a solution to this problem. 19 String Searching II • Input – String x – pattern y • Tasks – Return location of y in string x – Does pattern y occur in string x? • Pattern – [anything].html – $EN4$$ • Try and develop a solution to this problem. 20 String Searching • We will show an easy way to solve these string searching problems • In particular, we will show that we can solve these problems in the following manner – write down the pattern – the computer automatically turns this into a program which performs the actual string search 21 Module 14 • Regular languages – Inductive definitions – Regular expressions • syntax • semantics 22 Regular Languages (Regular Expressions) 23 Regular Languages • New language class – Elements are languages • We will show that this language class is identical to LFSA – Language class to be defined by Finite State Automata (FSA) – Once we have shown this, we will use the term “regular languages” to refer to this language class 24 Inductive Definition of Integers * • Base case definition – 0 is an integer • Inductive case definition – If x is an integer, then • x+1 is an integer • x-1 is an integer • Completeness – Only numbers generated using the above rules are integers 25 Inductive Definition of Regular Languages • Base case definition – – – – Let S denote the alphabet {} is a regular language {l} is a regular language {a} is a regular language for any character a in S • Inductive case definition – If L1 and L2 are regular languages, then • L1 union L2 is a regular language • L1 concatenate L2 is a regular language • L1* is a regular language • Completeness – Only languages generated using above rules are regular languages 26 Proving a language is regular * • Prove that {aa, bb} is a regular language – {a} and {b} are regular languages • base case of definition – {aa} = {a}{a} is a regular language • concatenation rule – {bb} = {b}{b} is a regular language • concatenation rule – {aa, bb} = {aa} union {bb} is a regular language • union rule • Typically, we will not go through this process to prove a language is regular 27 Regular Expressions • How do we describe a regular language? – Use set notation • {aa, bb, ab, ba}* • {a}{a,b}*{b} – Use regular expressions R • Inductive def of regular languages and regular expressions on page 72 • (aa+bb+ab+ba)* • a(a+b)*b 28 R and L(R) * • How we interpret a regular expression – What does a regular expression R mean to us? • aaba represents the regular language {aaba} • f represents the regular language {} • aa+bb represents the regular language {aa, bb} – We use L(R) to denote the regular language represented by regular expression R. 29 Precedence rules • What is L(ab+c*)? – Possible answers: • • • • {a}({b} union {c}*} ({a}{b,c})* ({ab} union {c})* {ab} union {c}* – Must know precedence rules • * first, then concatenation, then + 30 Precedence rules continued • Precedence rules similar to those for arithmetic expressions – ab+c2 • (a times b) + (c times c) • exponentiation first, then multiplication, then addition • Think of Kleene closure as exponentiation, concatenation as multiplication, and union as addition and the precedence rules are identical 31 Regular expressions are strings * • Let L be a regular language over the alphabet S – A regular expression R for L is just a string over the alphabet S union {(, ), +, *, f, l} which follows certain syntactic rules • That is, the set of legal regular expressions is itself a language over the alphabet S union {(, ), +, *} – f, a*aba are strings in the language of legal reg. exp. – )(, *a* are strings NOT in the language of legal reg. exp. 32 Semantics * • We give a regular expression R meaning when we interpret it to represent L(R). – aaba is just a string – we interpret it to represent the language {aaba}. • We do similar things with arithmetic expressions – 10+72 is just a string – We interpret this string to represent the number 59 33 Key fact * • A language L is a regular language iff there exists a reg. exp. R such that L(R) = L – When I ask for a proof that a language L is regular, rather than going through the inductive proof we saw earlier, I expect you to give me a regular expression R s.t. L(R) = L 34 Summary • Regular expressions are strings – syntax for legal regular expressions – semantics for interpreting regular expressions • Regular languages are a new language class – A language L is regular iff there exists a regular expression R s.t. L(R) = L • We will show that the regular languages are identical to LFSA 35 Module 15 • FSA’s – Defining FSA’s – Computing with FSA’s • Defining L(M) – Defining language class LFSA – Comparing LFSA to set of solvable languages (REC) 36 Finite State Automata New Computational Model 37 Tape • We assume that you have already seen FSA’s in CSE 260 – If not, review material in reference textbook • Only data structure is a tape – Input appears on tape followed by a B character marking the end of the input – Tape is scanned by a tape head that starts at leftmost cell and always scans to the right 38 Data type/States • The only data type for an FSA is char • The instructions in an FSA are referred to as states • Each instruction can be thought of as a switch statement with several cases based on the char being scanned by the tape head 39 Example program 1 switch(current tape cell) { case a: goto 2 case b: goto 2 case B: return yes } 2 switch (current tape cell) { case a: goto 1 case b: goto 1 case B: return no; } 40 New model of computation 1 switch(current tape cell) { case a: goto 2 case b: goto 2 case B: return yes } 2 switch (current tape cell) { case a: goto 1 case b: goto 1 case B: return no; } • FSA M=(Q,S,q0,A,d) – Q = set of states = {1,2} – S = character set = {a,b} • don’t need B as we see below – q0 = initial state = 1 – A = set of accepting (final) states = {1} • A is the set of states where we return yes on B • Q-A is set of states that return no on B – d = state transition function 41 Textual representations of d * 1 switch(current tape cell) { case a: goto 2 case b: goto 2 case B: return yes d(1,a) = 2, d(1,b)=2, d(2,a)=1, d(2,b) = 1 {(1,a,2), (1,b,2), (2,a,1), (2,b,1)} } 2 switch (current tape cell) { case a: goto 1 case b: goto 1 case B: return no; } 1 2 a b 2 1 2 1 42 Transition diagrams 1 switch(current tape cell) { case a: goto 2 case b: goto 2 case B: return yes } 2 switch (current tape cell) { case a: goto 1 case b: goto 1 case B: return no; } a,b 2 1 a,b Note, this transition diagram represents all 5 components of an FSA, not just the transition function d 43 Exercise • FSA M = (Q, S, q0, A, d) – – – – – Q = {1, 2, 3} S = {a, b} q0 = 1 A = {2,3} d: {d(1,a) = 1, d(1,b) = 2, d(2,a)= 2, d(2,b) = 3, d(3,a) = 3, d(3,b) = 1} • Draw this FSA as a transition diagram 44 Transition Diagram a a 1 b 2 b b 3 a 45 Computing with FSA’s 46 Computation Example * a a 1 b 2 b b 3 a Input: aabbaa 47 Computation of FSA’s in detail • A computation of an FSA M on an input x is a complete sequence of configurations • We need to define – Initial configuration of the computation – How to determine the next configuration given the current configuration – Halting or final configurations of the computation 48 Initial Configuration a a b 1 2 b b 3 FSA M a • Given an FSA M and an input string x, what is the initial configuration of the computation of M on x? – (q0,x) – Examples • • • • • • x = aabbaa (1, aabbaa) x = abab (1, abab) x=l (1, l) 49 Definition of ├M • (1, aabbaa) ├M (1, abbaa) a – config 1 “yields” config 2 in one step using FSA M 3 • (1,aabbaa) ├ M (2, baa) a b 2 – config 1 “yields” config 2 in 3 steps using FSA M • (1, aabbaa) ├* M (2, baa) 1 b b 3 FSA M a – config 1 “yields” config 2 in 0 or more steps using FSA M • Comment: – ├ M determined by transition function d – There must always be one and only one next configuration • If not, M is not an FSA 50 Halting Configurations * a • Halting configuration a b 1 2 b b 3 FSA M a – (q, l) – Examples • (1, l) • (3, l) • Accepting Configuration – State in halting configuration is in A • Rejecting Configuration – State in halting configuration is not in A 51 a a FSA M on x b b FSA M b a • Two possibilities for M running on x – M accepts x • M accepts x iff the computation of M on x ends up in an accepting configuration • (q0, x) ├*M (q, l) where q is in A – M rejects x • M rejects x iff the computation of M on x ends up in a rejecting configuration • (q0, x) ├M (q, l) where q is not in A * – M does not loop or crash on x • Why? 52 a a Examples b b FSA M b a – For the following input strings, does M accept or reject? • l • aa • aabba • aab • babbb 53 Definition of d*(q, x) a b 1 b a • Notation from the book • d(q, c) = p • dk(q, x) = p 2 – k is the length of x b 3 FSA M a • d*(q, x) = p • Examples – – – – – d(1, a) = 1 d(1, b) = 2 d4(1, abbb) = 1 d*(1, abbb) = 1 d*(2, baaaaa) = 3 54 a L(M) and LFSA * • L(M) or Y(M) a b b FSA M b a – The set of strings M accepts • Basically the same as Y(P) from previous unit – We say that M accepts/decides/recognizes/solves L(M) • Remember an FSA will not loop or crash – What is L(M) (or Y(M)) for the FSA M above? • N(M) – Rarely used, but it is the set of strings M rejects • LFSA – L is in LFSA iff there exists an FSA M such that L(M) = L. 55 LFSA Unit Overview • Study limits of LFSA – Understand what languages are in LFSA • Develop techniques for showing L is in LFSA – Understand what languages are not in LFSA • Develop techniques for showing L is not in LFSA • Prove Closure Properties of LFSA • Identify relationship of LFSA to other language classes 56 Comparing language classes Showing LFSA is a subset of REC, the set of solvable languages 57 LFSA subset REC • Proof – Let L be an arbitrary language in LFSA – Let M be an FSA such that L(M) = L • M exists by definition of L in LFSA – – – – Construct C++ program P from FSA M Argue P solves L There exists a C++ program P which solves L L is solvable 58 Visualization •Let L be an arbitrary language in LFSA •Let M be an FSA such that L(M) = L •M exists by definition of L in LFSA •Construct C++ program P from FSA M •Argue P solves L •There exists a program P which solves L •L is solvable L L LFSA M FSA’s REC P C++ Programs 59 Construction FSA M Program Construction P Algorithm • The construction is an algorithm which solves a problem with a program as input – Input to A: FSA M – Output of A: C++ program P such that P solves L(M) – How do we do this? 60 Comparing computational models • The previous slides show one method for comparing the relative power of two different computational models – Computational model CM1 is at least as general or powerful as computational model CM2 if • Any program P2 from computational model CM2 can be converted into an equivalent program P1 in computational model CM1. – Question: How can we show two computational models are equivalent? 61 Module 16 • Distinguishability – Definition – Help in designing/debugging FSA’s 62 Distinguishability 63 Questions • Let L be the set of strings over {a,b} which end with aaba. • Let M be an FSA such that L(M) = L. • Questions – Can aaba and aab end up in the same state of M? Why or why not? – How about aa and aab? – How about l or a? – How about b or bb? – How about l or bbab? 64 Definition * • String x is distinguishable from string y with respect to language L iff there exists a string z such that – xz is in L and yz is not in L OR – xz is not in L and yz is in L • When reviewing, identify the z for pair of strings on the previous slide 65 Questions • Let L be the set of strings over {a,b} that have length 2 mod 5 or 4 mod 5. • Let M be an FSA such that L(M) = L. • Questions – Are aa and aab distinguishable with respect to L? Can they end up in the same state of M? – How about aa and aaba? – How about l and a? – How about b and aabbaa? 66 Design an FSA to accept L • L = set of strings x over {a,b} such that length of x is 2 or 4 mod 5 • One design method – Is l in L? • Implication? – Is a distinguishable from l wrt L? • Implication? – Is b distinguishable from l wrt L? • Implication? – Is b distinguishable from a wrt L? • Implication? 67 Design an FSA to accept L • L = set of strings x over {a,b} such that length of x is 2 or 4 mod 5 • Design continued – Is aa distinguishable from l wrt L? • Implication? – Is aa distinguishable from a wrt L? • Implication? 68 Design an FSA to accept L • L = set of strings x over {a,b} such that length of x is 2 or 4 mod 5 • Design continued – What strings would we compare ab to? – What results do we get? – Implications? – How about ba? – How about bb? 69 Design an FSA to accept L • L = set of strings x over {a,b} such that length of x is 2 or 4 mod 5 • Design continued – We can continue in this vein, but it could go on forever – Now lets try something different – Consider string l. • What set of strings are indistinguishable from it wrt L? • Implications? 70 Design an FSA to accept L • L = set of strings x over {a,b} such that length of x is 2 or 4 mod 5 • Design continued – Consider string a. • What set of strings are indistinguishable from it wrt L? • Implications? – Consider string aa. • What set of strings are indistinguishable from it wrt L? • Implications? 71 Debugging an FSA • Do essentially the same thing – Identify some strings which end up in each state – Try and generalize each state to describe the language of strings which end up at that state. 72 Example 1 a,b I a a II III b IV a V b b a b VI a,b 73 Example 2 b I a a a II III a,b b IV a V b b 74 Example 3 a,b a II b I a IV a b III b 75 Module 17 • Closure Properties of Language class LFSA – Remember ideas used in solvable languages unit – Set complement – Set intersection, union, difference, symmetric difference 76 LFSA is closed under set complement • If L is in LFSA, then Lc is in LFSA • Proof – Let L be an arbitrary language in LFSA – Let M be the FSA such that L(M) = L • M exists by definition of L in LFSA – – – – Construct FSA M’ from M Argue L(M’) = Lc There exists an FSA M’ such that L(M’) = Lc Lc is in LFSA 77 Visualization L Lc •Let L be an arbitrary language in LFSA •Let M be the FSA such that L(M) = L •M exists by definition of L in LFSA •Construct FSA M’ from M •Argue L(M’) = Lc •Lc is in LFSA LFSA M M’ FSA’s 78 Construct FSA M’ from M • What did we do when we proved that REC, the set of solvable languages, is closed under set complement? • Construct program P’ from program P • Can we translate this to the FSA setting? 79 Construct FSA M’ from M • M = (Q, S, q0, A, d) • M’ = (Q’, S’, q’, A’, d’) – M’ should say yes when M says no – M’ should say no when M says yes – How? • Q’ = Q • S’ = S • q’ = q0 • d’ = d • A’ = Q-A 80 Q’ = Q S’ = S q’ = q0 d’ = d A’ = Q-A Example a a 1 b 2 b b a a 1 b 2 b 3 FSA M a b 3 FSA M’ a 81 Construction is an algorithm * FSA M FSA Construction M’ Algorithm • Set Complement Construction – Algorithm Specification • Input: FSA M • Output: FSA M’ such that L(M’) = L(M)c – Comments • This algorithm can be in any computational model. – It does not have to be (and typically is not) an FSA • These set closure constructions are useful. – More on this later 82 Specification of the algorithm FSA M FSA Construction M’ Algorithm • Your algorithm must give a complete specification of M’ in terms of M – Example: • Let input FSA M = (Q, S, q0, A, d) • Output FSA M’ = (Q’, S’, q’, A’, d’) where – Q’ = Q – S’ = S – q’ = q0 – d’ = d – A’ = Q-A • When I ask for such a construction algorithm specification, this type of answer is what I am looking for. Further algorithmic details on how such an algorithm would work are unnecessary. 83 LFSA closed under Set Intersection Operation (also set union, set difference, and symmetric difference) 84 LFSA closed under set intersection operation * • Let L1 and L2 be arbitrary languages in LFSA • Let M1 and M2 be FSA’s s.t. L(M1) = L1, L(M2) = L2 – M1 and M2 exist by definition of L1 and L2 in LFSA • • • • Construct FSA M3 from FSA’s M1 and M2 Argue L(M3) = L1 intersect L2 There exists FSA M3 s.t. L(M3) = L1 intersect L2 L1 intersect L2 is in LFSA 85 Visualization •Let L1 and L2 be arbitrary languages in LFSA •Let M1 and M2 be FSA’s s.t. L(M1) = L1, L(M2) = L2 •M1 and M2 exist by definition of L1 and L2 in LFSA •Construct FSA M3 from FSA’s M1 and M2 •Argue L(M3) = L1 intersect L2 •There exists FSA M3 s.t. L(M3) = L1 intersect L2 •L1 intersect L2 is in LFSA L1 L1 intersect L2 L2 LFSA M1 M3 M2 FSA’s 86 Algorithm Specification • Input – Two FSA’s M1 and M2 • Output – FSA M3 such that L(M3) = L(M1) intersection L(M2) FSA M1 FSA M2 Alg FSA M3 87 Use Old Ideas FSA M1 FSA M2 Alg FSA M3 • Key concept: Try ideas from previous closure property proofs • Example – How did the algorithm that was used to prove that REC is closed under set intersection work? – If we adapt this approach, what should M3 do with respect to M1, M2, and the input string? 88 Run M1 and M2 Simultaneously 0 l Alg FSA M3 1 1 0 FSA M1 FSA M2 0 1 0 1 0 2 l,A 0,A 1,A 2,A l,B 0,B 1,B 2,B 1 1 A M1 0 M2 0,1 B M3 What happens when M1 and M2 run on input string 11010? 89 0 Construction * • Input – FSA M1 = (Q1, S1, q1, d1, A1) – FSA M2 = (Q2, S2, q2, d2, A2) l 1 0 0 1 1 1 1 1 M1 0,1 A 0 B M2 0 0 2 • Output – FSA M3 = (Q3, S3, q3, d3, A3) – What is Q3? • Q3 = Q1 X Q2 where X is cartesian product • In this case, Q3 = {(l,A), (l,B), (0,A), (0,B), (1,A), (1,B), (2,A), (2,B)} – What is S3? • S3 = S1 = S 2 • In this case, S3 = {0,1} 90 0 Construction * • Input – FSA M1 = (Q1, S1, q1, d1, A1) – FSA M2 = (Q2, S2, q2, d2, A2) l 1 0 0 1 1 1 1 1 M1 0,1 A 0 B M2 0 0 2 • Output – FSA M3 = (Q3, S3, q3, d3, A3) – What is q3? • q3 = (q1, q2) • In this case, q3 = (l,A) – What is A3? • A3 = {(p, q) | p in A1 and q in A2} • In this case, A3 = {(0,B)} 91 0 Construction • Input – FSA M1 = (Q1, S1, q1, d1, A1) – FSA M2 = (Q2, S2, q2, d2, A2) l 1 0 0 1 1 1 1 1 M1 0,1 A 0 B M2 0 0 2 • Output – FSA M3 = (Q3, S3, q3, d3, A3) – What is d3? • For all p in Q1, q in Q2, a in S, d3((p,q),a) = (d1(p,a),d2(q,a)) • In this case, – d3((0,A),0) = (d1(0,0),d2(A,0)) – = (0,B) – d3((0,A),1) = (d1(0,1),d2(A,1)) – = (1,A) 92 Example Summary 0 l 1 0 1 1 0 1 0 1 0 2 l,A 1 1 A 0,A 0 M1 0 M2 1 1,A 0 2,A 0 1 0 0,1 B 1 l,B 0 0,B 1 1 1,B 0 2,B M3 93 Observation • Input – FSA M1 = (Q1, S1, q1, d1, A1) – FSA M2 = (Q2, S2, q2, d2, A2) • Output – FSA M3 = (Q3, S3, q3, d3, A3) – What is A3? • A3 = {(p, q) | p in A1 and q in A2} • What if operation were different? – Set union, set difference, symmetric difference 94 Observation continued * • Input – FSA M1 = (Q1, S1, q1, d1, A1) – FSA M2 = (Q2, S2, q2, d2, A2) • Output – FSA M3 = (Q3, S3, q3, d3, A3) – What is A3? • • • • Set intersection: A3 = {(p, q) | p in A1 and q in A2} Set union: A3 = {(p, q) | p in A1 or q in A2} Set difference: A3 = {(p, q) | p in A1 and q not in A2} Symmetric difference: A3 = {(p, q) | (p in A1 and q not in A2) or (p not in A1 and q in A2) } 95 Observation conclusion • LFSA is closed under – – – – set intersection set union set difference symmetric difference • The constructions used to prove these closure properties are essentially identical 96 Comments * • You should be able to execute this algorithm – Convert two FSA’s into a third FSA with the correct properties. • You should understand the idea behind this algorithm – The third FSA essentially runs both input FSA’s simultaneously on any input string – How we set A3 depending on the specific set operation • You should understand how this algorithm can be used to simplify design of FSA’s • You should be able to construct new algorithms for new closure property proofs 97 Comparison * L1 L1 intersect L2 L L L2 LFSA REC LFSA M1 M3 M2 M FSA’s P C++ Programs FSA’s 98 Module 18 • NFA’s – nondeterministic transition functions • computations are trees, not paths – L(M) and LNFA • LFSA subset of LNFA – Comparing computational models 99 Nondeterministic Finite State Automata NFA’s 100 Change: d is a relation • For an FSA M, d(q,a) results in one and only one state for all states q and characters a. – That is, d is a function • For an NFA M, d(q,a) can result in a set of states – That is, d is now a relation – Next step is not determined (nondeterministic) 101 Example NFA a,b a,b a a b a •Why is this only an NFA and not an FSA? Identify as many reasons as you can. 102 Computing with NFA’s • Configurations: same as they are for FSA’s • Computations are different – Initial configuration is identical – However, there may be several next configurations or there may be none. • Computation is no longer a “path” but is now a “graph” (often a tree) rooted at the initial configuration – Definition of halting, accepting, and rejecting configurations is identical – Definition of acceptance must be modified 103 a,b Computation Graph (Tree) a,b a a b a Input string aaaaba (1, aaaaba) (1, aaaba) (2, aaaba) (1, aaba) (2, aaba) (3, aaba) (1, aba) (2, aba) (3, aba) (1, ba) (2, ba) (1, a) (1, l) crash (3, ba) (4, a) (2, l) (5, l) 104 a,b Definition of ├ unchanged * a a b a Input string aaaaba (1, aaaaba)├ (1, aaaba) (1, aaaaba) (1, aaaba) a,b (1, aaaaba)├ (2, aaaba) (2, aaaba) (1, aaaaba)├3 (1, aba) (1, aaba) (2, aaba) (1, aaaaba)├3 (3, aba) (3, aaba) (1, aaaaba)├* (2, aba) (1, aba) (2, aba) (1, ba) (2, ba) (3, aba) crash (1, aaaaba)├* (3, aba) (1, aaaaba)├* (1, l) (3, ba) (1, aaaaba)├* (5, l) (1, a) (1, l) (4, a) (2, l) (5, l) 105 a,b a Acceptance and Rejection (1, aaba) (1, aba) (2, aba) (1, ba) (2, ba) (1, a) (1, l) b a M accepts string x if one of the configurations reached is an accepting configuration (2, aaaba) (2, aaba) a Input string aaaaba (1, aaaaba) (1, aaaba) a,b (q0, x) ├* (f, l),f in A (3, aaba) (3, aba) crash (3, ba) M rejects string x if all configurations reached are either not halting configurations or are rejecting configurations (4, a) (2, l) (5, l) 106 Comparison b a a a a,b b a b b FSA a,b a,b a a b a NFA 107 Defining L(M) and LNFA • M accepts string x if one of the configurations reached is an accepting configuration – (q0, x) |-* (f, l),f in A • M rejects string x if all configurations reached are either not halting configurations or are rejecting configurations • L(M) (or Y(M)) • N(M) • LNFA – Language L is in language class LNFA iff 108 Comparing language classes LFSA subset of LNFA 109 LFSA subset LNFA • Let L be an arbitrary language in LFSA • Let M be the FSA such that L(M) = L – M exists by definition of L in LFSA • • • • Construct an NFA M’ such that L(M’) = L Argue L(M’) = L There exists an NFA M’ such that L(M’) = L L is in LNFA – By definition of L in LNFA 110 Visualization •Let L be an arbitrary language in LFSA •Let M be an FSA such that L(M) = L •M exists by definition of L in LFSA •Construct NFA M’ from FSA M •Argue L(M’) = L •There exists an NFA M’ such that L(M’) =L •L is in LNFA L LFSA M FSA’s L LNFA M’ NFA’s 111 Construction • We need to make M into an NFA M’ such that L(M’) = L(M) • How do we accomplish this? 112 Module 19 • LNFA subset of LFSA – Theorem 4.1 on page 131 of Martin textbook – Compare with set closure proofs • Main idea – A state in FSA represents a set of states in original NFA 113 LNFA subset LFSA • Let L be an arbitrary language in • Let M be – M exists by definition of • • • • Construct an M’ such that L(M’) Argue L(M’) = There exists an M’ such that L(M’) = L is in – By definition of 114 Visualization •Let L be an arbitrary language in LNFA •Let M be an NFA such that L(M) = L •M exists by definition of L in LNFA •Construct FSA M’ from NFA M •Argue L(M’) = L •There exists an FSA M’ such that L(M’) = L •L is in LFSA L LNFA M NFA’s L LFSA M’ FSA’s 115 Construction Specification • We need an algorithm which does the following – Input: NFA M – Output: FSA M’ such that L(M’) = L(M) 116 a,b a,b a Difficulty * (2, aaaba) (1, aaba) (2, aaba) (3, aaba) (1, aba) (2, aba) (3, aba) (1, ba) (2, ba) (1, a) (1, l) b a Input string aaaaba • An NFA can be in several states after processing an input string x (1, aaaaba) (1, aaaba) a crash (3, ba) (4, a) (2, l) (5, l) 117 a,b a,b a Observation * (2, aaaba) (1, aaba) (2, aaba) (3, aaba) (1, aba) (2, aba) (3, aba) (1, ba) (2, ba) (1, a) (1, l) b a Input string aaaaba • All strings which end up in the set of states {1,2,3} are indistinguishable with respect to L(M) (1, aaaaba) (1, aaaba) a crash (3, ba) (4, a) (2, l) (5, l) 118 a,b a,b a Idea a b a Input string aaaaba • Given an NFA M = (Q,S,q0,d,A), the equivalent FSA M’ will have state set 2Q (one state for each subset of Q) • Example – In this case there are 5 states in Q – 2Q, the set of all subsets of Q, has 25 elements including {} and Q – The FSA M’ will have 25 states • What strings end up in state {1,2,3} of M’? – The strings which end up in states 1, 2, and 3 of NFA M. – In this case, strings which do not contain aaba and end with aa such as aa, aaa, and aaaa. 119 a,b a,b a Idea Illustrated (1,aaaaba) a b Input string aaaaba ({1}, aaaaba) (1, aaaba) (2, aaaba) (1, aaba) (2, aaba) (3, aaba) ({1,2,3}, aaba) (1, aba) (2, aba) (3, aba) ({1,2,3}, aba) (1, ba) (2, ba) (3, ba) ({1,2,3}, ba) (1, a) (1, l) a (2, l) ({1,2}, aaaba) (4, a) ({1,4}, a) (5, l) ({1,2,5}, l) 120 a,b Construction a,b NFA M 1 a 2 a 3 • Input NFA M = (Q, S, q0, d, A) • Output FSA M’ = (Q’, S’, q’, d’, A’) – What is Q’? • all subsets of Q including Q and {} • In this case, Q’ = – What is S’? • We always make S’ = S • In this case, S’ = S = {a,b} – What is q’? • We always make q’ = {q0} • In this case q’ = 121 a,b Construction a,b NFA M 1 a 2 a 3 • Input NFA M = (Q, S, q0, d, A) • Output FSA M’ = (Q’, S’, q’, d’, A’) – What is A’? • Suppose a string x ends up in states 1 and 2 of the NFA M above. – Is x accepted by M? – Should {1,2} be an accepting state in FSA M’? • Suppose a string x ends up in states 1 and 2 and 3 of the NFA M above. – Is x accepted by M? – Should {1,2,3} be an accepting state in FSA M’? • Suppose p = {q1, q2, …, qk} where q1, q2, …, qk are in Q • p is in A’ iff at least one of the states q1, q2, …, qk is in A • In this case, A’ = 122 a,b Construction a,b NFA M 1 a 2 a 3 • Input NFA M = (Q, S, q0, d, A) • Output FSA M’ = (Q’, S’, q’, d’, A’) – What is d’? • If string x ends up in states 1 and 2 after being processed by the NFA above, where does string xa end up after being processed by the NFA above? • Figuring out d’(p,a) in general – Suppose p = {q1, q2, …, qk} where q1, q2, …, qk are in Q – Then d’(p,a) = d(q1,a) union d(q2,a) union … union d(qk,a) » Similar to 2 FSA to 1 FSA construction – In this case » d’({1,2},a) = 123 Construction Summary a,b a,b NFA M 1 a 2 a 3 • Input NFA M = (Q, S, q0, d, A) • Output FSA M’ = (Q’, S’, q’, d’, A’) – Q’ = all subsets of Q including Q and {} • In this case, Q’ = {{}, {1}, {2}, {3}, {1,2}, {1,3}, {2,3}, {1,2,3}} – S’ = S • In this case, S’ = S = {a,b} – q’ ={q0} • In this case, q’ = {1} – A’ • Suppose p = {q1, q2, …, qk} where q1, q2, …, qk are in Q • p is in A’ iff at least one of the states q1, q2, …, qk is in A – d’ • Suppose p = {q1, q2, …, qk} where q1, q2, …, qk are in Q • Then d’(p,a) = d(q1,a) union d(q2,a) union … union d(qk,a) 124 Example Summary a b a,b 1 a,b a 2 a 3 a {1} b b {1,2} a {1,2,3} a,b NFA M {} b a {1,3} a,b b {2} a {3} a,b {2,3} FSA M’ 125 Example Summary Continued a b a,b 1 a,b a 2 a 3 a {1} b b {1,2} a {1,2,3} a,b NFA M {} b a {1,3} a,b b {2} a {3} a,b {2,3} FSA M’ These states cannot be reached from initial state and are unnecessary. 126 Example Summary Continued a b a,b 1 a,b a 2 NFA M a 3 a {1} b b b {1,2} a {1,2,3} a {1,3} Smaller FSA M’ By examination, we can see that state {1,3} is unnecessary. However, this is a case by case optimization. It is not a general technique or algorithm. 127 Example 2 a,b A a B b C NFA M Step 1: name the three states of NFA M 128 a,b Step 2: transition table A a B b C NFA M {A} {B} {} {B,C} a {B} {B} b {} {B,C} {} {} {B} {B,C} d’({B,C},a) = d(B,a) U d(C,a) = {B} U {} = {B} d’({B,C},b) = d(B,b) U d(C,b) = {B,C} U {} = {B,C} 129 a,b Step 3: accepting states A a B b C NFA M {A} {B} {} {B,C} a {B} {B} b {} {B,C} {} {} {B} {B,C} Which states should be accepting? Any state which includes an accepting state of M, in this case, C. A’ = {{B,C}} 130 a,b Step 4: Answer A a B b C NFA M {A} {B} {} {B,C} a {B} {B} b {} {B,C} Initial state is {A} Set of final states A’ = {{B,C}} {} {} {B} {B,C} This is sufficient. You do NOT need to turn this into a diagram. 131 a,b Step 5: Optional a A a b {B} b C NFA M b a {A} B b a {B,C} a,b {} FSA M’ 132 Comments • You should be able to execute this algorithm – You should be able to convert any NFA into an equivalent FSA. • You should understand the idea behind this algorithm – For an FSA M’, strings which end up in the same state of M’ are indistinguishable wrt L(M’) – For an NFA M, strings which end up in the same set of states of M are indistinguishable wrt L(M) 133 Comments • You should understand the importance of this algorithm – Design tool • We can design using NFA’s • A computer will convert this NFA into an equivalent FSA – FSA’s can be executed by computers whereas NFA’s cannot (or at least cannot easily be run by computers) – Chaining together algorithms • Perhaps it is easy to build NFA’s to accept L1 and L2 • Use this algorithm to turn these NFA’s to FSA’s • Use previous algorithm to build FSA to accept L1 intersect L2 • You should be able to construct new algorithms for new closure property proofs 134 Module 20 • NFA’s with l-transitions – NFA-l’s • Formal definition • Simplifies construction – LNFA-l – Showing LNFA-l is a subset of LNFA (extra credit) • and therefore a subset of LFSA 135 Defining NFA-l’s 136 Change: l-transitions • We now allow an NFA M to change state without reading input • That is, we add the following categories of transitions to d – d(q,l) is allowed 137 Example * a,b l l a,b a,b a a b a b a a b a,b 138 Defining L(M) and LNFA-l • M accepts string x if one of the configurations reached is an accepting configuration – (q0, x) |-* (f, l),f e A • M rejects string x if all configurations reached are either not halting configurations or are rejecting configurations • L(M) or Y(M) • N(M) • LNFA-l – Language L is in language class LNFAl iff 139 LNFA-l subset LFSA • Recap of what we already know – Let M be any NFA – There exists an algorithm A1 which constructs an FSA M’ such that L(M’) = L(M) • New goal – Let M be any NFA-l – There exists an algorithm A2 which constructs an FSA M’ such that L(M’) = L(M) 140 Visualization • Goal – Let M be any NFA-l – There exists an algorithm A2 which constructs an FSA M’ such that L(M’) = L(M) NFA-l M A2 FSA M’ 141 Modified Goal NFA-l M A2 FSA M’ • Question – Can we use any existing algorithms to simplify the task of developing algorithm A2? • Yes, we can use algorithm A1 which converts an NFA M1 into an FSA M’ such that L(M’) = L(M1) NFA-l M A2’ NFA M1 A1 FSA M’ Algorithm A2 142 New Goal (extra credit) A2’ NFA-l M NFA M1 • Difficulty – NFA-l M can make transitions on l – How can the NFA M1 simulate these l-transitions? a 1 l 2 b 3 l 4 l 5 b 6 143 Basic Idea A2’ NFA-l M NFA M1 • For each state q of M and each character a of S, figure out which states are reachable from q taking any number of l-transitions and exactly one transition on that character a. • In the NFA-d M1, directly connect q to each of these states using an arc labeled with a. a 1 l 2 b 3 l 4 b 2 3 5 b 6 b b 1 l 4 5 6 144 Process State 2 NFA-d M1 A2’ NFA-l M a 1 l 2 b 3 l 4 b 1 2 b b 5 b 6 b b 3 l 4 5 6 b 145 Process State 3 NFA-d M1 A2’ NFA-l M a 1 l 2 b 3 l 4 b 1 2 b a b 5 b 6 b a b 3 l 4 5 6 b b 146 Final Picture NFA-d M1 A2’ NFA-l M a 1 l 2 b 3 l 4 b 1 l a 2 b b 3 a b 4 a 5 b a 5 b 6 b 6 b b b 147 a Construction 1 l 2 b 3 l 4 l 5 b • Input NFA-l M = (Q, S, q0, d, A) • Output NFA M1 = (Q1, S1, q1, d1, A1) – What is Q1? • Q1 = Q • In this case, Q1 = {1,2,3,4,5,6} – What is S1? • S1 = S • In this case, S1 = S = {a,b} – What is q1? • We always make q1 = q0 • In this case q1 = 1 148 6 a Construction 1 l 2 b 3 l 4 l 5 b 6 • Input NFA-l M = (Q, S, q0, d, A) • Output NFA M1 = (Q1, S1, q1, d1, A1) – What is d1? • d1(q,a) = the set of states reachable from state q in M taking any number of l-transitions and exactly one transition on the character a – More on this later • In this case – d1(1,a) = {} – d1(1,b) = {3,4,5} – What is A1? • A1 = A with one minor change – If an accepting state is reachable from q0 using only l-transitions, then we make q1 an element of A1 • In this case, using only l-transitions, no accepting state is reachable from q0, so A1 = A 149 Computing d1(q,a) • a 1 l 2 b 3 l 4 l 5 b d1(q,a) = the set of states reachable from state q in M taking 0 or more l-transitions and exactly one transition on the character a – Break this down into three steps • First compute all states reachable from q using 0 or more l-transitions – We call this set of states L(q) • Next, compute all states reachable from any element of L(q) using the character a – We can denote these states as d(L(q),a) • Finally, compute all states reachable from states in d(L(q),a) using 0 or more l-transitions – We denote these states as L(d(L(q),a)) – This is the desired answer 150 6 a Example • 1 l 2 b 3 l 4 l 5 b d1(1,b) = {3,4,5} – Compute L(1), all states reachable from state 1 using 0 or more ltransitions • L(1) = {1,2} – Compute d(L(1),b), all states reachable from any element L(1) of using the character b: • d(L(1),b) = d({1,2},b) • = d(1,b) U d(2,b) • = {} U {3} = {3} – Compute L(d(L(1),b)), all states reachable from states in d(L(1),b) using 0 or more l-transitions • L(d(L(1),b)) = L(3) • = {3,4,5} 151 6 Comments • For extra credit, you should be able to execute this algorithm – Convert any NFA-l into an equivalent NFA. • For extra credit, you should understand the idea behind this algorithm – Why the transition function is computed the way it is – Why A1 may need to include q1 in some cases • You should understand the importance of this algorithm – Compiler role again – Use in combination with previous algorithm for converting any NFA into an equivalent FSA to create a new algorithm for converting any NFA-l into an equivalent FSA 152 LNFA-l = LFSA • Implications – Let us primarily use the term LFSA to refer to this language class – Given a language L is in LFSA • We know there exists an FSA M s.t. L(M) = L • We know there exists an NFA M s.t. L(M) = L – To show a language L is in LFSA • Show there exists an FSA M s.t. L(M) = L • Show there exists an NFA-l M s.t. L(M) = L 153 Module 21 • Closure Properties for LFSA using NFA’s – From now on, when I say NFA, I mean any NFA including an NFA-l unless I add a specific restriction – union (second proof) – concatenation – Kleene closure 154 LFSA closed under set union (again) 155 LFSA closed under set union • Let L1 and L2 be arbitrary languages in LFSA • Let M1 and M2 be NFA’s s.t. L(M1) = L1, L(M2) = L2 – M1 and M2 exist by definition of L1 and L2 in LFSA and the fact that every FSA is an NFA • • • • Construct NFA M3 from NFA’s M1 and M2 Argue L(M3) = L1 union L2 There exists NFA M3 s.t. L(M3) = L1 union L2 L1 union L2 is in LFSA 156 Visualization •Let L1 and L2 be arbitrary languages in LFSA •Let M1 and M2 be NFA’s s.t. L(M1) = L1, L(M2) = L2 •M1 and M2 exist by definition of L1 and L2 in LFSA and the fact that every FSA is an NFA •Construct NFA M3 from NFA’s M1 and M2 •Argue L(M3) = L1 union L2 •There exists NFA M3 s.t. L(M3) = L1 union L2 •L1 union L2 is in LFSA L1 L1 union L2 L2 LFSA M1 M3 M2 NFA’s 157 Algorithm Specification • Input – Two NFA’s M1 and M2 • Output – NFA M3 such that L(M3) = ? NFA M1 NFA M2 A NFA M3 158 Use l-transition NFA M1 NFA M2 a A NFA M3 a l M1 a,b a,b a,b l a,b a,b a,b M2 M3 159 General Case * M1 NFA M1 NFA M2 A NFA M3 l l M2 M3 160 Construction * NFA M1 NFA M2 A NFA M3 • Input – NFA M1 = (Q1, S1, q1, d1, A1) – NFA M2 = (Q2, S2, q2, d2, A2) • Output – NFA M3 = (Q3, S3, q3, d3, A3) – What is Q3? • Q3 = – What is S3? • S3 = S1 = S 2 – What is q3? • q3 = 161 Construction NFA M1 NFA M2 A NFA M3 • Input – NFA M1 = (Q1, S1, q1, d1, A1) – NFA M2 = (Q2, S2, q2, d2, A2) • Output – NFA M3 = (Q3, S3, q3, d3, A3) – What is A3? • A3 = – What is d3? • d3 = 162 Comments • You should be able to execute this algorithm • You should understand the idea behind this algorithm • You should understand how this algorithm can be used to simplify design • You should be able to design new algorithms for new closure properties • You should understand how this helps prove result that regular languages and LFSA are identical – In particular, you should understand how this is used to construct an NFA M from a regular expression r s.t. L(M) = L(r) – To be seen later 163 LFSA closed under set concatenation 164 LFSA closed under set concatenation • Let L1 and L2 be arbitrary languages in LFSA • Let M1 and M2 be NFA’s s.t. L(M1) = L1, L(M2) = L2 – M1 and M2 exist by definition of L1 and L2 in LFSA and the fact that every FSA is an NFA • • • • Construct NFA M3 from NFA’s M1 and M2 Argue L(M3) = L1 concatenate L2 There exists NFA M3 s.t. L(M3) = L1 concatenate L2 L1 concatenate L2 is in LFSA 165 Visualization • Let L1 and L2 be arbitrary languages in LFSA • Let M1 and M2 be NFA’s s.t. L(M1) = L1, L(M2) = L2 – M1 and M2 exist by definition of L1 and L2 in LFSA and the fact that every FSA is an NFA • Construct NFA M3 from NFA’s M1 and M2 • Argue L(M3) = L1 concatenate L2 • There exists NFA M3 s.t. L(M3) = L1 concatenate L2 • L1 concatenate L2 is in LFSA L1 L1 concatenate L2 L2 LFSA M1 M3 M2 NFA’s 166 Algorithm Specification • Input – Two NFA’s M1 and M2 • Output – NFA M3 such that L(M3) = NFA M1 NFA M2 A NFA M3 167 Use l-transition a NFA M1 NFA M2 A NFA M3 a M1 l a,b a,b a,b a,b a,b a,b M2 M3 168 General Case M1 NFA M1 NFA M2 A NFA M3 l l M2 M3 169 Construction NFA M1 NFA M2 A NFA M3 • Input – NFA M1 = (Q1, S1, q1, d1, A1) – NFA M2 = (Q2, S2, q2, d2, A2) • Output – NFA M3 = (Q3, S3, q3, d3, A3) – What is Q3? • Q3 = – What is S3? • S3 = S1 = S 2 – What is q3? • q3 = 170 Construction NFA M1 NFA M2 A NFA M3 • Input – NFA M1 = (Q1, S1, q1, d1, A1) – NFA M2 = (Q2, S2, q2, d2, A2) • Output – NFA M3 = (Q3, S3, q3, d3, A3) – What is A3? • A3 = – What is d3? • d3 = 171 Comments • You should be able to execute this algorithm • You should understand the idea behind this algorithm • You should understand how this algorithm can be used to simplify design • You should be able to design new algorithms for new closure properties • You should understand how this helps prove result that regular languages and LFSA are identical – In particular, you should understand how this is used to construct an NFA M from a regular expression r s.t. L(M) = L(r) – To be seen later 172 LFSA closed under Kleene Closure 173 LFSA closed under Kleene Closure • Let L be arbitrary language in LFSA • Let M1 be an NFA s.t. L(M1) = L – M1 exists by definition of L1 in LFSA and the fact that every FSA is an NFA • • • • Construct NFA M2 from NFA M1 Argue L(M2) = L1* There exists NFA M2 s.t. L(M2) = L1* L1* is in LFSA 174 Visualization • Let L be arbitrary language in LFSA • Let M1 be an NFA s.t. L(M1) =L L1 – M1 exists by definition of L1 in LFSA and the fact that every FSA is an NFA • Construct NFA M2 from NFA M1 • Argue L(M2) = L1* • There exists NFA M2 s.t. L(M2) = L1* • L1* is in LFSA L1* LFSA M1 M2 NFA’s 175 Algorithm Specification • Input – NFA M1 • Output – NFA M2 such that L(M2) = NFA M1 A NFA M2 176 Use l-transition NFA M1 NFA M2 l a M1 A l a M2 177 General Case * NFA M1 A NFA M2 l l M1 l M 2 178 Construction NFA M1 A NFA M2 • Input – NFA M1 = (Q1, S1, q1, d1, A1) • Output – NFA M2 = (Q2, S2, q2, d2, A2) – What is Q2? – What is S2? • S2 = S1 – What is q2? • q2 = 179 Construction NFA M1 A NFA M2 • Input – NFA M1 = (Q1, S1, q1, d1, A1) • Output – NFA M2 = (Q2, S2, q2, d2, A2) – What is A2? • A2 = – What is d2? • d2 = 180 Comments • You should be able to execute this algorithm • You should understand the idea behind this algorithm – Why do we need to make an extra state p? • You should understand how this algorithm can be used to simplify design • You should be able to design new algorithms for new closure properties • You should understand how this helps prove result that regular languages and LFSA are identical – In particular, you should understand how this is used to construct an NFA M from a regular expression r s.t. L(M) = L(r) 181 – To be seen later Module 22 • Regular languages are a subset of LFSA – algorithm for converting any regular expression into an equivalent NFA – Builds on existing algorithms described in previous lectures 182 Regular languages are a subset of LFSA 183 Reg. Lang. subset LFSA • Let L be an arbitrary • Let R be the – R exists by definition of • Construct an – M is constructed from • Argue • There exists an • L is in – By definition of 184 Visualization L L Regular Languages R Regular Expressions LFSA M NFA-l’s 185 Algorithm Specification • Input – Regular expression R • Output – NFA M such that L(M) = Regular expression R A NFA-l M 186 Recursive Algorithm • We have an inductive definition for regular languages and regular expressions • Our algorithm for converting any regular expression into an equivalent NFA is recursive in nature – Base Case – Recursive or inductive Case 187 Base Case • Regular expression R has zero operators – No concatenation, union, Kleene closure – For any alphabet S, only |S| + 2 regular languages can be depicted by any regular expression with zero operators • The empty language f • The language {l} • The |S| languages consisting of one string {a} for all a in S 188 Table lookup • Finite number of base cases means we can use table lookup to handle them f l a b 189 Recursive Case • Regular expression R has at least one operator – This means R is built up from smaller regular expressions using the union, Kleene closure, or concatenation operators – More specifically, there are 3 cases: • R = R1+R2 • R = R1R2 • R = R1* 190 Recursive Calls 1) R = R1 + R2 2) R = R1 R2 3) R = R1* • The algorithm recursively calls itself to generate NFA’s M1 and M2 which accept L(R1) and L(R2) • The algorithm applies the appropriate construction – union – concatenation – Kleene closure to NFA’s M1 and M2 to produce an NFA M such that L(M) = L(R) 191 Pseudocode Algorithm _____________ RegExptoNFA(_____________) { regular expression R1, R2; NFA M1, M2; Modify R by removing unnecessary enclosing parentheses /* Base Case */ If R = a, return (NFA for {a}) /* include l here */ If R = f, return (NFA for {}) /* Recursive Case */ Find “last operator O” of regular expression R Identify regular expressions R1 (and R2 if necessary) M1 = RegExptoNFA(R1) M2 = RegExptoNFA(R2) /* if necessary */ return (OP(M1, M2)) /* OP is chosen based on O */ } 192 Example A: R = (b+a)a* Last operator is concatenation R1 = (b+a) R2 = a* Recursive call with R1 = (b+a) B: R = (b+a) Extra parentheses stripped away Last operator is union R1 = b R2 = a Recursive call with R1 = b 193 Example Continued C: R = b Base case NFA for {b} returned B: return to this invocation of procedure Recursive call where R = R2 = a D: R = a Base case NFA for {a} returned B: return to this invocation of procedure return UNION(NFA for {b}, NFA for {a}) A: return to this invocation of procedure Recursive call where R = R2 = a* 194 Example Finished E: R = a* Last operator is Kleene closure R1 = a Recursive call where R = R1 = a F: R = a Base case NFA for {a} returned E: return to this invocation of procedure return (KLEENE(NFA for {a})) A: return to this invocation of procedure return CONCAT(NFA for {b,a}, NFA for {a}*) 195 Pictoral View concatenate (b|a)a* a l l l b (b|a) b lunion a b a l l a l l Kleene a* Closure l a l aa 196 Parse Tree We now present the “parse” tree for regular expression (b+a)a* concatenate union b Kleene closure a a 197 Module 23 • Regular languages review – Several ways to define regular languages – Two main types of proofs/algorithms • Relative power of two computational models proofs/constructions • Closure property proofs/constructions – Language class hierarchy • Applications of regular languages 198 Defining regular languages 199 Three definitions • LFSA – A language L is in LFSA iff there exists an FSA M s.t. L(M) = L • LNFA – A language L is in LNFA iff there exists an NFA M s.t. L(M) = L • Regular languages – A language L is regular iff there exists a regular expression R s.t. L(R) = L • Conclusion – All these language classes are equivalent – Any language which can be represented using any one of these models can be represented using either of the other two models 200 Two types of proofs/constructions 201 Relative power proofs • These proofs work between two language classes and two computational models • The crux of these proofs are algorithms which behave as follows: – Input: One program from the first computational model – Output: A program from the second computational model that is equivalent in function to the first program 202 Closure property proofs • These proofs work within a single language class and typically within a single computational model • The crux of these proofs are algorithms which behave as follows: – Input: 1 or 2 programs from a given computational model – Output: A third program from the same computational model that accepts/describes a third language which is a combination of the languages accepted/described by the two input programs 203 Comparison L L1 L L1 intersect L2 L2 LNFA LFSA LFSA M1 M NFA’s M3 M’ FSA’s M2 FSA’s 204 Language class hierarchy ? H H regular REC RE All languages over alphabet S 205 Three remaining topics • Myhill-Nerode Theorem – Provides technique for proving a language is not regular – Also represents fundamental understanding of what a regular language is • Decision problems about regular languages – Most are solvable in contrast to problems about recursive languages • Pumping lemma – Provides technique for proving a language is not regular 206 Module 24 • Myhill-Nerode Theorem – – – – distinguishability equivalence classes of strings designing FSA’s proving a language L is not regular 207 Distinguishability 208 Distinguishable and Indistinguishable • String x is distinguishable from string y with respect to language L iff – there exists a string z such that • xz is in L and yz is not in L OR • xz is not in L and yz is in L • String x is indistinguishable from string y with respect to language L iff – for all strings z, • xz and yz are both in L OR • xz and yz are both not in L 209 Example • Let EVEN-ODD be the set of strings over {a,b} with an even number of a’s and an odd number of b’s – Is the string aa distinguishable from the string bb with respect to EVEN-ODD? – Is the string aa distinguishable from the string ab with respect to EVEN-ODD? 210 Equivalence classes of strings 211 Definition of equivalence classes • Every language L partitions S* into equivalence classes via indistinguishability – Two strings x and y belong to the same equivalence class defined by L iff x and y are indistinguishable w.r.t L – Two strings x and y belong to different equivalence classes defined by L iff x and y are distinguishable w.r.t. L 212 Example How does EVEN-ODD partition {a,b}* into equivalence classes? Strings with an EVEN number of a’s and an EVEN number of b’s Strings with an ODD number of a’s and an EVEN number of b’s Strings with an EVEN number of a’s and an ODD number of b’s Strings with an ODD number of a’s and an ODD number of b’s 213 Second Example Let 1MOD3 be the set of strings over {a,b} whose length mod 3 = 1. How does 1MOD3 partition {a,b}* into equivalence classes? Length mod 3 = 0 Length mod 3 = 1 Length mod 3 = 2 214 Designing FSA’s 215 Designing an FSA for EVEN-ODD l a b b Even Even Odd Even Even Odd Odd Odd a b a ab 216 Length mod 3 = 0 Designing an FSA for 1MOD3 Length mod 3 = 1 Length mod 3 = 2 a,b l a,b a a,b aa 217 Proving a language is not regular 218 Third Example • Let EQUAL be the set of strings x over {a,b} s.t. the number of a’s in x = the number of b’s in x • How does EQUAL partition {a,b}* into equivalence classes? • How many equivalence classes are there? • Can we construct a finite state automaton for EQUAL? 219 Myhill-Nerode Theorem 220 Theorem Statement • Two part statement – If L is regular, then L partitions S* into a finite number of equivalence classes – If L partitions S* into a finite number of equivalence classes, then L is regular • One part statement – L is regular iff L partitions S* into a finite number of equivalence classes 221 Implication 1 • Method for constructing FSA’s to accept a language L – – – – Identify equivalence classes defined by L Make a state for each equivalence class Identify initial and accepting states Add transitions between the states • You can use a canonical element of each equivalence class to help with building the transition function d 222 Implication 2 • Method for proving a language L is not regular – Identify equivalence classes defined by L – Show there are an infinite number of such equivalence classes • Table format may help, but it is only a way to help illustrate that there are an infinite number of equivalence classes defined by L 223 Proving a language is not regular revisited 224 Proving EQUAL is not regular • Let EQUAL be the set of strings x over {a,b} s.t. the number of a’s in x = the number of b’s in x • We want to show that EQUAL partitions {a,b}* into an infinite number of equivalence classes • We will use a table that is somewhat reminiscent of the table used for diagonalization – Again, you must be able to identify the infinite number of equivalence classes being defined by the table. They ultimately represent the proof that EQUAL or whatever language you are working with is not regular. 225 Table * a aa aaa aaaa aaaaa ... b IN OUT OUT OUT OUT ... bb OUT IN OUT OUT OUT ... bbb OUT OUT1 IN OUT OUT ... bbbb OUT OUT OUT IN OUT ... bbbbb OUT OUT OUT OUT IN ... … … … … … … The strings being distinguished are the rows. The tables entries indicate that the concatenation of the row string with the column string is in or not in EQUAL. Each complete column shows one row string is distinguishable from all the other row strings. 226 Concluding EQUAL is nonregular * • We have shown that EQUAL partitions {a,b}* into an infinite number of equivalence classes – In this case, we only identified some of the equivalence classes defined by EQUAL, but that is sufficient • Thus, the Myhill-Nerode Theorem implies that EQUAL is nonregular 227 Summary • Myhill-Nerode Theorem and what it says – It does not say a language L is regular iff L is finite • Many regular languages such as S* are not finite – It says that a language L is regular iff L partitions S* into a finite number of equivalence classes • Provides method for designing FSA’s • Provides method for proving a language L is not regular – Show that L partitions S* into an infinite number of equivalence classes 228 Two/Three Types of Problems • Create a table that helps prove that a specific language L is not regular – You get to choose the “row” and “column” strings – I choose the “row” strings • Identify the equivalence classes defined by L as highlighted by a given table 229 Module 25 • Decision problems about regular languages – Basic problems are solvable • halting, accepting, and emptiness problems – Solvability of other problems • answer-preserving input transformations to basic problems 230 Programs • In this unit, our programs are the following three types of objects – FSA’s – NFA’s – regular expressions • Previously, they were C++ programs – Review those topics after mastering today’s examples 231 Basic Decision Problems (and algorithms for solving them) 232 Halting Problem • Input – FSA M – Input string x to M • Question – Does M halt on x? • Give an algorithm for solving the FSA halting problem. 233 Accepting Problem • Input – FSA M – Input string x to M • Question – Is x in L(M)? • Give an algorithm ACCEPT for solving the accepting problem. 234 Empty Language Problem • Input – FSA M • Question – Is L(M)={}? • Give an algorithm for solving the empty language problem. – Don’t look ahead to the next slide. 235 Algorithms for solving empty language problem • Algorithm 1 – View FSA M as a directed graph (nodes, arcs) – See if any accepting node is reachable from the start node • Algorithm 2 – Let n be the number of states in FSA M – Run ACCEPT(M,x) for all input strings of length < n – If any are accepted THEN no ELSE yes • Why is algorithm 2 correct? 236 Solving Other Problems (using answer-preserving input transformations) 237 Complement Empty Problem • Input – FSA M • Question – Is (L(M))c = {}? • Show how to use an answer-preserving input transformation to help solve this problem – Show that the Complement Empty problem transforms to the Empty Language problem – Don’t look at next two slides 238 Algorithm Description • Convert input FSA M into an FSA M’ such that L(M’) = (L(M))c – We do this by applying the algorithm which we used to show that LFSA is closed under complement • Feed FSA M’ into algorithm which solves the empty language problem • If that algorithm returns yes THEN yes ELSE no 239 Input Transformation Illustrated FSA M Complement FSA M’ Construction Algorithm for solving empty language problem Yes/No Algorithm for complement empty problem The complement construction algorithm is the answer-pres. input transformation. If M is a yes input instance of CE, then M’ is a yes input instance of EL. If M is a no input instance of CE, then M’ is a no input instance of EL. 240 NFA Empty Problem • Input – NFA M • Question – Is L(M)={}? • Show how to use answer-preserving input transformations to help solve this problem – Show that the NFA Empty problem transforms to the Empty Language problem 241 Input Transformation Yes/No Algorithm for NFA empty problem 242 Equal Problem • Input – FSA’s M1 and M2 • Question – Is L(M1) = L(M2)? • Show how to use answer-preserving input transformations to solve this problem – Try and transform this problem to the empty language problem – If L(M1) = L(M2), then what combination of L(M1) and L(M2) must be empty? 243 Input Transformation Illustrated Yes/No Algorithm for Equal problem 244 Summary • Decision problems with programs as inputs • Basic problems – You need to develop algorithms from scratch based on properties of FSA’s • Solving new problems – You need to figure out how to combine the various algorithms we have seen in this unit to solve the given problem 245 Module 26 • Pumping Lemma – A technique for proving a language L is NOT regular – What does the Pumping Lemma mean? – Proof of Pumping Lemma 246 Pumping Lemma How do we use it? 247 Pumping Condition • A language L satisfies the pumping condition if: – there exists an integer n > 0 such that – for all strings x in L of length at least n – there exist strings u, v, w such that • • • • x = uvw and |uv| <= n and |v| >= 1 and For all k >= 0, uvkw is in L 248 Pumping Lemma • All regular languages satisfy the pumping condition Regular languages “Pumping Languages” All languages over {a,b} 249 Pumping Implications Regular • We can use the pumping lemma to prove a language L is not regular – How? • We cannot use the pumping lemma to prove a language is regular – How might we try to use the pumping lemma to prove that a language L is regular and why does it fail? 250 Pumping Lemma What does it mean? 251 Pumping Condition • A language L satisfies the pumping condition if: – there exists an integer n > 0 such that – for all strings x in L of length at least n – there exist strings u, v, w such that • • • • x = uvw and |uv| <= n and |v| >= 1 and For all k >= 0, uvkw is in L 252 v can be pumped 1) x in L 2) x = uvw 3) For all k >= 0, uvkw is in L • Let x = abcdefg be in L • Then there exists a substring v in x such that v can be repeated (pumped) in place any number of times and the resulting string is still in L – uvkw is in L for all k >= 0 • For example – v = cde • uv0w = uw = abfg is in L • uv1w = uvw = abcdefg is in L • uv2w = uvvw = abcdecdefg is in L • uv3w = uvvvw = abcdecdecdefg is in L • … 253 What the other parts mean • A language L satisfies the pumping condition if: – there exists an integer n > 0 such that • defer what n is till later – for all strings x in L of length at least n • x must be in L and have sufficient length – there exist strings u, v, w such that • x = uvw and • |uv| <= n and – v occurs in the first n characters of x • |v| >= 1 and – v is not l • For all k >= 0, uvkw is in L 254 Examples • Example 1 – Let L be the set of even length strings over {a,b} – Let x = abaa – Let n = 2 – What are the possibilities for v? • abaa, abaa • abaa – Which one satisfies the pumping lemma? 255 Examples * • Example 2 – Let L be the set of strings over {a,b} where the number of a’s mod 3 is 1 – Let x = abbaaa – Let n = 3 – What are the possibilities for v? • abbaaa, abbaaa, abbaaa • abbaaa, abbaaa • abbaaa – Which ones satisfy the pumping lemma? 256 Pumping Lemma Proof 257 High Level Outline • Let L be an arbitrary regular language • Let M be an FSA such that L(M) = L – M exists by definition of LFSA and the fact that regular languages and LFSA are identical • Show that L satisfies the pumping condition – Use M in this part • Pumping Lemma follows 258 First step: n+1 prefixes of x • Let n be the number of states in M • Let x be an arbitrary string in L of length at least n – Let xi denote the ith character of string x • There are at least n+1 distinct prefixes of x – – – – – – – length 0: l length 1: x1 length 2: x1x2 ... length i: x1x2 … xi ... length n: x1x2 … xi … xn 259 Example • Let n = 8 • Let x = abcdefgh • There are 9 distinct prefixes of x – – – – – length 0: l length 1: a length 2: ab ... length 8: abcdefgh 260 Second step: Pigeon-hole Principle • As M processes string x, it processes each prefix of x – In particular, each prefix of x must end up in some state of M • Situation – There are n+1 distinct prefixes of x – There are only n states in M • Conclusion – At least two prefixes of x must end up in the same state of M • Pigeon-hole principle – Name these two prefixes p1 and p2 261 Third step: Forming u, v, w • Setting: – Prefix p1 has length i – Prefix p2 has length j > i • prefix p1 of length i: x1x2 … xi • prefix p2 of length j: x1x2 … xi xi+1 … xj • Forming u, v, w – – – – Set u = p1 = x1x2 … xi Set v = xi+1… xj Set w = xj+1 … x|x| x1x2 … xi xi+1… xj xj+1 … x|x| u v w 262 Example 1 0 1 2 3 4 • Let M be a 5-state FSA that accepts all strings over {a,b,c,…,z} whose length mod 5 = 3 • Consider x = abcdefghijklmnopqr, a string in L • What are the two prefixes p1 and p2? • What are u, v, w? 263 0 Example 2 0 1 1 1 0 2 • Let M be a 3-state FSA that accepts all strings over {0,1} whose binary value mod 3 = 1 • Consider x = 10011, a string in L • What are the two prefixes p1 and p2? • What are u, v, w? 264 Fourth step: Showing u, v, w satisfy all the conditions • |uv| <= n – uv = p2 – p2 is one of the first n+1 prefixes of string x • |v| >= 1 – v consists of the characters in p2 after p1 – Since p2 and p1 are distinct prefixes of x, v is not l • For all k >= 0, uvkw in L – u=p1 and uv=p2 end up in the same state q of M • This is how we defined p1 and p2 – Thus for all k >= 0, uvk ends up in state q – The string w causes M to go from state q to an accepting state 265 Example 1 again 0 1 2 3 4 • Let M be a 5-state FSA that accepts all strings over {a,b,c,…,z} whose length mod 5 = 3 • Consider x = abcdefghijklmnopqr, a string in L • What are u, v, w? – u=l – v = abcde – w = fghijklmnopqr • |uv| = 5 <= 5 • |v| = 5 >= 1 • For all t>=0, (abcde)tfghijklmnopqr is in L 266 0 Example 2 again 0 1 1 1 0 2 • Let M be a 3-state FSA that accepts all strings over {0,1} whose binary interpretation mod 3 = 1 • Consider x = 10011, a string in L • What are u, v, w? – u=1 – v = 00 – w = 11 • |uv| = 3 <= 3 • |v| = 2 >= 1 • For all k>=0, 1(00)k11 is in L 267 Pumping Lemma • A language L satisfies the pumping condition if: – there exists an integer n > 0 such that – for all strings x in L of length at least n – there exist strings u, v, w such that • • • • x = uvw and |uv| <= n and |v| >= 1 and For all k >= 0, uvkw is in L • Pumping Lemma: All regular languages satisfy the pumping condition 268 Module 27 • Applications of Pumping Lemma – General proof template • What is the same in every proof • What changes in every proof – Incorrect pumping lemma proofs – Some rules of thumb 269 Pumping Lemma Applying it to prove a specific language L is not regular 270 How we use the Pumping Lemma • We choose a specific language L – For example, {ajbj | j > 0} • We show that L does not satisfy the pumping condition • We conclude that L is not regular 271 Showing L “does not pump” • A language L satisfies the pumping condition if: – there exists an integer n > 0 such that – for all strings x in L of length at least n – there exist strings u, v, w such that • • • • x = uvw and |uv| <= n and |v| >= 1 and For all k >= 0, uvkw is in L • A language L does not satisfy the pumping condition if: – for all integers n of sufficient size – there exists a string x in L of length at least n such that – for all strings u, v, w such that • x = uvw and • |uv| <= n and • |v| >= 1 – There exists a k >= 0 such that uvkw is not in L 272 Example Proof • A language L does not satisfy the pumping condition if: – for all integers n of sufficient size – there exists a string x in L of length at least n such that – for all strings u, v, w such that • x = uvw and • |uv| <= n and • |v| >= 1 – There exists a k >= 0 such that uvkw is not in L • Proof that L = {aibi | i>0} does not satisfy the pumping condition • Let n be the integer from the pumping lemma • Choose x = anbn • Consider all strings u, v, w s.t. • x = uvw and • |uv| <= n and • |v| >= 1 • Argue that uvkw is not in L for some k >= 0 – Argument must apply to all possible u,v,w – Continued on next slide 273 Example Proof Continued • Proof that L = {aibi | i>0} does not satisfy the pumping condition • Let n be the integer from the pumping lemma • Choose x = anbn • Consider all strings u, v, w s.t. • x = uvw and • |uv| <= n and • |v| >= 1 • Argue that uvkw is not in L for some k >= 0 – Argument must apply to all possible u,v,w – Continued on right • uv0w = uw is not in L – uv contains only a’s • why? – uw = an-|v|bn • Follows from previous line and uvw = x = anbn – uw contains fewer a’s than b’s • why? – Therefore, uw is not in L • Therefore L does not satisfy the pumping condition 274 Alternate choice of k • Proof that L = {aibi | i>0} does not satisfy the pumping condition • Let n be the integer from the pumping lemma • Choose x = anbn • Consider all strings u, v, w s.t. • x = uvw and • |uv| <= n and • |v| >= 1 • Argue that uvkw is not in L for some k >= 0 – Argument must apply to all possible u,v,w – Continued on right • uv2w = uvvw is not in L – uv contains only a’s • why? – uvvw = an+|v|bn • follows from previous line and uvw = x = anbn – uvvw contains more a’s than b’s • why? – Therefore, uvvw is not in L • Therefore L does not satisfy the pumping condition 275 Pumping Lemma Some bad applications of the pumping lemma 276 Bad Pumping Lemma Applications • We now look at some examples of bad applications of the pumping lemma • We work with the language EQUAL consisting of the set of strings over {a,b} such that the number of a’s equals the number of b’s • We focus first on bad choices of string x • We then consider another flawed technique 277 First bad choice of x • A language L does not satisfy the pumping condition if: – Let n be the integer from the pumping lemma – there exists a string x in L of length at least n such that – for all strings u, v, w such that • Let n be the integer from the pumping lemma • Choose x = a10b10 – What is wrong with this choice of x? • x = uvw and • |uv| <= n and • |v| >= 1 – There exists a k >= 0 such that uvkw is not in L 278 Second bad choice of x • A language L does not satisfy the pumping condition if: – Let n be the integer from the pumping lemma – there exists a string x in L of length at least n such that – for all strings u, v, w such that • Let n be the integer from the pumping lemma • Choose x = anb2n – What is wrong with this choice of x? • x = uvw and • |uv| <= n and • |v| >= 1 – There exists a k >= 0 such that uvkw is not in L 279 Third bad choice of x • A language L does not satisfy the pumping condition if: – Let n be the integer from the pumping lemma – there exists a string x in L of length at least n such that – for all strings u, v, w such that • x = uvw and • |uv| <= n and • |v| >= 1 – There exists a k >= 0 such that uvkw is not in L • Let n be the integer from the pumping lemma • Choose x = (ab)n – What is wrong with this choice of x? • The problem is there is a choice of u, v, w satisfying the three conditions such that for all k >=0, uvkw is in L • What is an example of such a u, v, w? 280 Find the flaw in this proof • A language L does not satisfy the pumping condition if: – Let n be the integer from the pumping lemma – there exists a string x in L of length at least n such that – for all strings u, v, w such that • x = uvw and • |uv| <= n and • |v| >= 1 – There exists a k >= 0 such that uvkw is not in L • Let n be the integer from the pumping lemma • Choose x = anbn • Let u = a2, v =a, w = an-3bn – |uv| = 3 <= n – |v| = 1 • Choose k = 2 • Argue uv2w is not in EQUAL – uv2w = uvvw = a2aaan-3bn = an+1bn – There is one more a than b in uv2w – Thus uv2w is not in L 281 Pumping Lemma Two rules of thumb 282 Two Rules of Thumb * • Try to make the first n characters of x identical – For EQUAL, choose x = anbn rather than (ab)n • Simplifies case analysis as v only contains a’s • Try k=0 or k=2 – k=0 • This reduces number of occurrences of that first character – k=2 • This increases number of occurrences of that first character 283 Summary • We use the Pumping Lemma to prove a language is not regular – Note, does not work for all nonregular languages, though • Choosing a good string x is first key step • Choosing a good integer k is second key step • Must apply argument to all legal u, v, w 284