CSE 2001 - Introduction to The Theory of Computation Week 5 - Jun 9, 2014 • Today: NFA to Regular expression • Using pumping lemma • Mealy machines for controls • • ----New - Chapter: 2: Context-free languages and grammars Review of test 1 1 Announcements • • Tutorials are still 4:30-6 Wednesday, CB 129 • • • • Test 2 is planned for July 7. Big one • (In the future: A final review session is planned for Wednesday 4:30-6 pm on July 23) Office hours Thursdays 4 to 5 pm in Lassonde 2013 for individual help (Note room change) Test 1 marks are posted. Averages: 13.5 - 14 Next week: Assignment 2 (part 1- first 3 questions) Details on using submit: To submit a file named foo.txt, you can go to the submit web server, https:// webapp.eecs.yorku.ca/submit/ 2 Characterizing the languages of Regular Expressions • Let RE be the set of all languages that can be represented by regular expressions •We are proving that RE and the Regular Languages are the same class of languages, i.e., RE = RL Proof: Step 1: For every regular expression there’s an equivalent NFA - Lemma 1.55 Step 2: For every DFA there’s an equivalent regular expression -Lemma 1.60 (intermediate step: GNFA) 04/29/2014 CSE 2001 3 Generalized Nondeterministic Finite Automata • It is not clear how a regular expression can express the language of a DFA • To accomplish this it is easiest to use another RL model, Generalized NFAs • The main difference is that a transition is labelled by an arbitrary regular expression R instead of just a symbol from Σ • In one step the GNFA can read an arbitrarily long string of the current and subsequent input symbols, and can make a transition if the string is in L(R) • It turns out that any language recognized by a GNFA is regular 04/29/2014 CSE 2001 4 GNFA • A Generalized NFA looks like an ordinary NFA except: • Like a DFA, complete set of transitions for all states and symbols, except: –No transitions into the start state –No transitions out of single accept state – ɛ transitions are allowed – (the main thing) transitions are labelled by regular expressions, not just by symbols of Σ – only one transition from any state to another • Multiple transitions are combined using U 04/29/2014 CSE 2001 5 Example GNFA (some labels not shown) 0110 0 ∅ qS qA ε 01* q2 0* ∪ 11 In this example an input 0111... could cause a transition from qs to q2 using 011 from the input So could 0 alone, or 01 - nondeterministic choice 04/29/2014 CSE 2001 6 RL = RE • Lemma 1.60: If a language is regular, then it can be described by a regular expression. • Proof strategy using Generalized NFA’s - GNFAs § § § § Regular implies equivalent DFA, by definition convert that DFA to equivalent (simple) GNFA convert the GNFA to an equivalent 2-state GNFA the regular expression is equal to the one on the label on the single remaining transition in the 2-state GNFA GNFA - extended (generalized) NFA’s that are defined to have regular expressions as labels on their transitions instead of only symbols from Σ U {ɛ} 04/29/2014 CSE 2001 7 Generalized NFA - definition 1.64 A Generalized non-deterministic finite automaton is M=(Q, Σ, δ, qstart, qaccept) with • Q - a finite set of states Σ - the input alphabet (Let R be the set of all regular expressions over Σ) • qstart the start state • qaccept the (unique) accept state • transition function δ:(Q - {qaccept})×(Q - {qstart}) → R δ(qi,qj)= Rk means that when the machine is in qi, and the remaining symbols of the input begin with a sequence of characters w and w ∈ L(Rk) a transition from qi to qj can be made (and w is removed from the input on this computation path) 04/29/2014 CSE 2001 8 GNFA - Computation and Acceptance • Sipser p.73 • A GNFA accepts a string w∈Σ* if w=w1w2...wn (each wi in Σ*) and there is a sequence of states r0, r1, ..., rk where each ri ∈ Q such that: 1. r0 = qstart 2. rk = qaccept 3. for each i, 1≦i≦k, wi is in the language L(Ri) where Ri = δ(ri-1,ri) i.e. Ri is the label on the transition from ri-1 to ri • The language recognized by a GNFA is the set of strings it accepts • It’s not clear how to build them, but we can use them here • It turns out that a language is recognized by a GNFA iff the language is regular (Proof: exercise) 04/29/2014 CSE 2001 9 Characteristics of GNFA’s δ • δ:(Q\{qaccept})×(Q\{qstart}) → R (Other than the accept state, there are transitions from every state to every state except the start state, self included) The interior Q\{qaccept,qstart} is fully connected by δ From qstart there are only ‘outgoing transitions’ To qaccept there are only ‘ingoing transitions’ Impossible qi→qj transitions are labeled “δ(qi,qj) = ∅” qS Observation: This GNFA recognizes the language L(R) whatever regular expression R is. Why is this true? 04/29/2014 CSE 2001 R∈R qA 10 Proof Idea of Lemma 1.60 Proof idea (given a DFA M): Construct an equivalent GNFA M’ with k≥2 states Reduce one-by-one the internal states until k=2, while keeping the regular expressions “right” (together they denote all strings taking automaton from one state to another) This GNFA will be of the form qS R qA This regular expression R will be such that L(R) = L(M) 04/29/2014 CSE 2001 11 Simplified example: fixing one path q1-qa when ripping out q2 ∑={a,b,c,d} φ b qs φ q1 a∪ɛ d∪(a∪ɛ)c*bc (a∪ɛ)c*bc (a∪ɛ)bc dd φ qa φ bc q2 c To remove q2 we want to fix up the q1 to qa arc so that any string that could take the machine from q1 to qa before using q2, can still take the machine from q1 to qa without using q2. (After that we must fix all paths between other pairs of states so the loss of q2 doesn’t affect them either) 04/29/2014 CSE 2001 12 Summary φ qi R4 ∪RR4 qs 1R 2 *R 3 R1 qj R3 qrip R2 Summary: To fix the transition from qi to qj to repair any strings lost through deleting of qrip replace R4 by R4∪R1R2*R3 . i.e. δ(qi,qj) becomes δ(qi,qj) ∪ δ(qi,qrip) δ(qrip,qrip)* δ(qrip,qj) Do this for every transition from any qi to any qj, (including i=j) (except qrip to itself) c CSE 2001 04/29/2014 13 Proof of Lemma 1.60 Let M be DFA with k states Create “equivalent” GNFA M’ with k+2 states Reduce in k steps M’ to M’’ with 2 states, always maintaining equivalence of language recognized The resulting GNFA describes a single regular expression R that expresses all and only the strings that can take the original M from a start state to an accept state The regular language L(M) equals the language L(R) of the regular expression R 04/29/2014 CSE 2001 14 DFA M → Equivalent GNFA M’ Let M have k states Q={q1,…,qk} - Add two states qaccept and qstart qS - Connect qstart to earlier q1: qj ε q1 ε qA - Connect old accepting states to qaccept - Complete missing transitions by qi ∅ qj - Join multiple transitions: qi 04/29/2014 1 0 qj becomes CSE 2001 qi 0∪1 qj 15 Convert(M): Remove Internal state of GNFA M to get M’ If the GNFA M has more than 2 states, ‘rip’ internal qrip to get equivalent GNFA M’ by: - Removing state qrip: Q’=Q\{qrip} - Changing the transition function δ by δ’(qi,qj) = δ(qi,qj) ∪ (δ(qi,qrip)(δ(qrip,qrip))*δ(qrip,qj)) for every qi∈Q’\{qaccept} and qj∈Q’\{qstart} R1 q rip qi R2 R3 R4 04/29/2014 = qi R4∪(R1R2*R3) qj CSE 2001 16 qj Proof Lemma 1.60 - continued • Use induction (on number of states of GNFA) to prove correctness of the conversion procedure. • Base case: k=2. • Inductive step: 2 cases – qrip is/is not on accepting path. R1 q rip qi R2 R3 R4 04/29/2014 = qi R4∪(R1R2*R3) qj CSE 2001 17 qj Convert • Define a recursive procedure CONVERT (M) that takes a GNFA M and returns a regular expression equivalent to L(M) – Convert (M) –Say M = (Q, Σ, δ, qs, qa) –If M has only 2 states, return δ(qs,qa) –If M has > 2 states »Select any internal state of M, call it qrip (e.g. pick the highest numbered internal state) »Define M’ to be (Q-{qrip}, Σ,δ’,qs qa) where δ’(qi,qj) = δ(qi,qj) ∪ (δ(qi,qrip)(δ(qrip,qrip))*δ(qrip,qj)) »Return Convert (M’) 04/29/2014 CSE 2001 18 Claim 1.65: If M is a k-state GNFA k≧2 the language recognized by CONVERT(M) is the same as L(M) • Base case: k=2 – Matching the regular expression R labelling the qa to qs transition is the only way an input string can be accepted by the GNFA. So L(M) = R, as required • Induction step: k>2 – By IH we can assume that the assertion is true for k-1 states, and we must prove it for k states – So let M = (Q, Σ, δ, qs, qa) be a k-state GNFA and let M’ be as in the procedure CONVERT(M) –By Lemma on next slide, L(M’)=L(M) –So the result of CONVERT(M’) returns a regular expression equivalent to L(M’) and L(M’) = L(M) 04/29/2014 CSE 2001 19 Ripping Lemma • Lemma: L(M) = L(M’) where M’ is the result of ripping any internal state qrip out of M as described before • Proof (⊆) Say w ∈ L(M) i.e. w=w1w2wn, each wh is a string over Σ and there’s a sequence of states r0...rn such that wh∈L(δ(rh-1, rh)) and r0=qs and rn=qa • If none of the r’s is qrip, then there is also an accepting computation of w by M’, because the regular expressions labelling transitions of M’ are unioned with the ones labelling M • If there is one appearance of qrip in the accepting computation of M on w, let ri be the last state visited before the visit to qrip and rj be the first state visited after the visit to qrip. w can be divided into segments a, b and c such that a is the part of w that takes M from the start state to qi, b is the part of w that takes M from the qi to qrip and from qrip to qj , and c is the part that takes M from the last qj to qs. • As argued above the labels on transitions of M from qs prior to the visit to ri are parts of the labels on transitions of M’, and similarly from rj to qa. Hence strings a and c are in δ’(qs, qi) and δ’(qj, qa) Then in M’ the regular expression (δ(qi,qrip)(δ(qrip,qrip))*δ(qrip,qj)) can take M’ from state qi to state qj on b like the regular expressions in M (using δ(qrip,qrip) 0 times.) • If there are multiple visits to qrip the computations between the first visit to qrip and the last visit to qrip can be divided into components each of which is a computation from qrip to qrip, meantime not returning to qrip. Each part of these from qrip to qrip uses transitions in δ not involving qrip and so can be duplicated by transitions in δ‘ as above. (Other direction (⊇) left as exercise) 04/29/2014 CSE 2001 20 Recap RL = RE Let R be a regular expression, then there exists an NFA M such that L(R) = L(M) The language L(M) of a DFA M is equivalent to a language L(M’) of a GNFA = M’, which can be converted state-by-state to an equivalent twostate M’’ The transition qstart ⎯R→ qacceptof M’’ satisfies L(R) = L(M’’) Hence: RE ⊆ NFA = DFA ⊆ GNFA ⊆ RE 04/29/2014 CSE 2001 21 4 state GNFA to 3state GNFA, ripping out q1 @ @ 04/29/2014 CSE 2001 22 Regular languages and Non-regular languages • Every finite language is regular (Proof: Exercise) • There are some infinite languages that we can prove are not regular • The proofs are by contradiction, e.g. if the language were regular that would contradict the pumping lemma, the closure properties, etc. • Essential idea to prove the pumping lemma is that if the language has a sufficiently large string in it, the DFA to accept it must repeat a state 04/29/2014 CSE 2001 23 Repeating DFA Paths Consider a DFA M with size |Q|=p On any accepted string w of length p, p+1 states get visited. For any accepted string w of length p≥|Q|, there must be a j such that the computation of M on input w goes through states like: q1,…,qj, …,qj,…,qk, i.e. some state on the path must repeat qj q1 qk 04/29/2014 CSE 2001 24 Repeating DFA Paths The action of the DFA in qj with a given symbol is always the same.If we repeat (or ignore) the qj,…,qj part, the new path will again be an accepting path qj q1 qk 06/09/2014 25 Pumping Lemma (Thm 1.37) For every regular language L, there is a pumping length p, such that for any string w ∈ L and |w|≥p, w can be broken into three parts, w= xyz such that: 1) x yi z ∈ L for every i∈{0,1,2,…} 2) |y| ≥ 1 3) |xy| ≤ p Note that 1) implies that xz ∈ L 2) says that y cannot be the empty string ε Condition 3) is not always used, but it shows the repeated string is not too far from the beginning of the string this can often help 06/09/2014 26 Use of Pumping Lemma • To prove a language B is not regular: –Assume B is regular (to get a proof by contradiction) –If B is regular then the pumping lemma must apply to B –So choose a sufficiently long string s in the language B. –BY PL s can be broken into parts xyz satisfying |y| ≧1, |xy|≦ 1, and for any i, xyiz is in B (i.e. s has to have the pumping property) –Use the above fact to get a contradiction • Choose a “nice” string s to make it easy to get contradiction • Choose a value for i • Prove that xyiz is not in B • But PL says xyiz would have to be in B, if B were regular –Contradiction • Therefore B is not regular 06/09/2014 27 Example : “ww” Let F be { ww | w ∈{0,1}* } (Ex. 1.40) Assume (for the sake of contradiction) that F is regular. So PL applies to F Let p be the pumping length for F, and choose s = 0p10p1 so s is in F (s can be any string in F of length at least p) Then for some x, y, z s can be written s = xyz = 0p10p1 satisfying PL and condition 3) tells us that |xy|≤p Because of condition 3, only one possibility for y : y=0k So if we pump y, (taking i=2) we get xyyz = 0p+k10p1 ∉ F. Pumping lemma says it must be in F if F were regular but it is not in F => PL doesn’t hold for F => F is not regular (Without using property 3 this is a little more difficult) 04/29/2014 CSE 2001 28 Using closure properties of Regular Languages with Pumping Lemma Let C = { w | # of 0s in w equals # of 1s in w} Problem: If xyz ∈ C with y ∈ C, then xyiz ∈ C Idea: If C is regular and F is regular, then we know the intersection set C∩F has to be regular as well Solution: Assume as usual that C is regular Take as the regular F = { 0n1m | n,m∈N}, then for the intersection: C∩F = { 0n1n | n∈N } would be regular too But we already know that C∩F is not regular Conclusion: C is not regular 06/09/2014 29 Pumping Down: E = { 0i1j | i≥j } Problem: ‘pumping up’ s=0 p1pwith y=0k gives xyyz = 0p+k1p, xy3z = 0p+2k1p, which are all in E (hence do not give contradictions) Solution: pump down to xz = 0p–k1p. Overall for s = xyz = 0p1p (with |xy|≤p): y=0k, hence xz = 0p–k1p ∉ E Contradiction: E is not regular 06/09/2014 30 Pumping lemma review - steps to prove some language L is not regular • You know there is a pumping length p for L (as long as it is an infinite language) • You choose any string s you like of that length or longer • There have to exist x,y,z (satisfying the criteria) such that s=xyz • You choose i in xyiz, and prove that the resulting string cannot be in language L • i can be 0 or any positive integer 06/09/2014 31 Examples of PL use • Language String - i - Problem with Pumped string 2 No matter where y falls, wrong form (or use property 3) 0p10p1 2 Second part ends in 1, first part in 0 • {0m1n | m>n} 0p+11p 0 • {anbn|n≧0} • {ww} • 0 p1p {x|x has equal number of a’s and b’s} 04/29/2014 Reducing number of 0’s means n≧m No pumping needed, use closure under intersection and AnBn result CSE 2001 32 Mealy Machines (+Moore) • On assignment you looked at Mealy machines, aka Finite State Transducers, sequential circuits • A DFA with output rather than just accept/reject • Transitions have both input and output symbols from finite alphabets separated by / • Very widely used for controls, e.g. elevator, vending machine, traffic light, alarm system, simple codes, protocols, process control • The input alphabet can be signals from sensors and the output alphabet can be signals to actuators • Sold off the shelf to be programmed - PLC 04/29/2014 CSE 2001 33 Simple Mealy controllers Vending Machine Mealy Machine to dispense candy after deposit of 3 nickels Start nickel/ nickel/ S0 $0.05 $0.10 dime/release lock nickel/release lock/ dime/release lock Extended to allow dimes Start nickel/ S0 nickel/ $0.05 $0.10 nickel/release lock dime/ 04/29/2014 CSE 2001 34 Break • When we come back: • Start of Chapter 2, Context-Free Languages • Review of test 06/09/2014 35 Chapter 2: Context-free languages • Context-Free Languages (CFL) • Context-Free Grammars (CFG) derivations, parse trees, ambiguity • Chomsky Normal Form of CFG • RL ⊂ CFL 6/9/2014 CSE 2001, Fall 2013 36 Context-Free Languages Context-free languages (CFLs) are a more powerful (augmented) model than Finite Automata CFLs allow us to describe non-regular languages like { 0n1n | n≥0} General idea: CFLs are languages that can be recognized by finite automata that have one single stack added: { 0n1n | n≥0} is a CFL { 0n1n0n | n≥0} is not a CFL 6/9/2014 CSE 2001, Fall 2013 37 Context-Free Grammars Grammars: new way to define/specify a language Uses substitution rules aka productions that can be repeatedly applied, starting from a start symbol What simple process produces the non-regular language { 0n1n | n ∈ N }? Start symbol S with rewriting rules: 1) S → 0S1 2) S → ɛ S yields 0n1n for any n according to S → 0S1 → 00S11 → … → 0nS1n → 0n1n 6/9/2014 CSE 2001, Fall 2013 38 Context-Free Grammars (Def.) A context free grammar G=(V,Σ,R,S) is defined by • V: a finite set variables (or non-terminals) • Σ: finite set terminal symbols (with V∩Σ=∅) • R: finite set of substitution rules V → (V∪Σ)* • S: start symbol ∈ V (usually left side of topmost rule) Example: ({S},{a,b}, R, S} where R consists of the two rules: S → aSb S→ɛ (Can write this in one line using shorthand S →aSb | ɛ ) 6/9/2014 CSE 2001, Fall 2013 39 Derivation ⇒* A single step derivation “⇒” consists of the substitution of a variable by a string according to one of the substitution rules Example: using the rule “A→BB”, we can have the derivation “01AB0 ⇒ 01BBB0” A sequence of several derivations (or none) is indicated by “ ⇒* ” Same example: “0AA ⇒* 0BBBB” • For the grammar with the rules S → 0S1 | ɛ there is a derivation of the string 000111 as follows: S → 0S1→ 00S11→ 000S111→ 000111 6/9/2014 40 Derivations, formally • If v,v,w are strings of non-terminals and terminals and A → w is a rule in the grammar we say that uAv yields uwv, written uAv ⇒ uwv • We say that u derives v, written u * v, if u=v or if there is a sequence u1,u2,...,uk u1 => u2 =>...=>uk =>v such that u=> The language of grammar G with start symbol S is denoted by L(G): L(G) = { w ∈ Σ* | S ⇒* w } 06/09/2014 41 Some Remarks The language L(G) = { w∈Σ* | S ⇒* w } contains only strings of terminals, not variables. Notation: we summarize several rules, like A→B A → 01 by A → B | 01 | AA A → AA Unless stated otherwise: topmost rule concerns the start variable Usually write variables in upper case, terminals in lower 6/9/2014 CSE 2001, Fall 2013 42 Context-Free Grammars (Ex.) Consider the CFG G=(V,Σ,R,S) with V = {S} Σ = {0,1} R: S → 0S1 | 0Z1 Z → 0Z | ε Then L(G) = {0i1j | i≥j } S yields a string of 0s and 1s 0j+k1j according to: S ⇒ 0S1 ⇒ … ⇒ 0jS1j ⇒ 0jZ1j ⇒ 0j0Z1j ⇒ … ⇒ 0j+kZ1j ⇒ 0j+kε1j = 0j+k1j 6/9/2014 CSE 2001, Fall 2013 43 Importance of CFL Model for natural languages (Noam Chomsky) Specification of programming languages: “parsing of a computer program” Describes mathematical structures Intermediate between regular languages and computable languages (Chapters 3,4,5 and 6) 6/9/2014 CSE 2001, Fall 2013 44 Some closure properties of the context-free languages • It is easy to see CFLs are closed under union, concatenation and star • 6/9/2014 CSE 2001, Fall 2013 45 Example: Boolean Expressions Consider the CFG G=(V,Σ,R,S) with V = {S} Σ = {0,1,(,),¬,∨,∧} R: S → 0 | 1 | (¬S) | (S∨S) | (S∧S) Some elements of L(G): 0 (((¬0)∨1)∧(1∧1)) (1∨(0∧0)) Note: Parentheses prevent “1∨0∧0” confusion. Consider S → 0 | 1 | S∨S | S∧S 6/9/2014 CSE 2001, Fall 2013 46 Human Languages Variables enclosed in angle brackes terminals - strings of english words Rules: <SENTENCE> → <NOUN-PHRASE><VERB-PHRASE> <NOUN-PHRASE> → <CMPLX-NOUN> | <CMPLX-NOUN><PREP-PHRASE> <VERB-PHRASE> → <CMPLX-VERB> | <CMPLX-VERB><PREP-PHRASE> <CMPLX-NOUN> → <ARTICLE><NOUN> <CMPLX-VERB> → <VERB> | <VERB><NOUN-PHRASE> … a | the <NOUN> → boy | girl | house <VERB> → sees | ignores <ARTICLE> → Possible element: the boy sees the girl 6/9/2014 CSE 2001, Fall 2013 47 Parse trees and leftmost derivations • There is another method to show the derivation of a string pictorially called a parse tree S S S 0 0 ɛ 11 • If a derivation rewrites the leftmost variable in the string, the derivation is called a leftmost derivation. There is a one-to one correspondence between leftmost derivations and parse trees 04/29/2014 CSE 2001 48 Parse Trees The parse tree of (0)∨((0)∧(1)) via rules S → 0 | 1 | ¬(S) | (S)∨(S) | (S)∧(S): S ( 0 ) ∨ S ( S ( S ) ) ∨ ( 0 6/9/2014 CSE 2001, Fall 2013 S ) 1 49 Ambiguity A grammar is ambiguous if some strings are derived ambiguously. A string is derived ambiguously if it has more than one leftmost derivations. Typical example: rule S → 0 | 1 | S+S | S×S S ⇒ S+S ⇒ S×S+S ⇒ 0×S+S ⇒ 0×1+S ⇒ 0×1+1 versus S ⇒ S×S ⇒ 0×S ⇒ 0×S+S ⇒ 0×1+S ⇒ 0×1+1 6/9/2014 CSE 2001, Fall 2013 50 Ambiguity and Parse Trees The ambiguity of 0×1+1 is shown by the two different parse trees: S S S S 0 04/29/2014 × + S S S 1 0 × S S 1 1 CSE 2001 + S 1 51 More on Ambiguity The two different derivations: S ⇒ S+S ⇒ 0+S ⇒ 0+1 and S ⇒ S+S ⇒ S+1 ⇒ 0+1 do not constitute an ambiguous string 0+1 (They are not both leftmost derivations and they will have the same parse tree) However the above grammar does produce ambiguous strings. In this case there are other grammars for the same language that are not ambiguous Languages that can only be generated by ambiguous grammars are “inherently ambiguous” 6/9/2014 CSE 2001, Fall 2013 52 Inherent Ambiguity • Some context-free languages are inherently ambiguous, for example: {aibjck | i=j or j=k} The grammar for simple arithmetic expressions given at the bottom of page 105 is ambiguous, but the language it describes is not inherently ambiguous - See example 2.4 and the note at the top of page 104 6/9/2014 CSE 2001, Fall 2013 53 Context-Free Languages Any language that can be generated by a context free grammar is a context-free language (CFL). The CFL { 0n1n | n≥0 } shows us that certain CFLs are nonregular languages. Q1: Are all regular languages context free? Q2: Which languages are outside the class CFL? 6/9/2014 CSE 2001, Fall 2013 54 “Chomsky Normal Form” A context-free grammar G = (V,Σ,R,S) is in Chomsky normal form if every rule is of the form A → BC or A→x with variables A∈V and B,C∈V \{S}, and x∈ Σ For the start variable S we also allow the rule S → ε (but the start symbol may not appear on the right hand side of any rule) Advantage: Grammars in this form are far easier to analyze. 6/9/2014 CSE 2001, Fall 2013 55 Theorem 2.9 Every context-free language can be described by a grammar in Chomsky normal form. Outline of Proof: We can rewrite any CFG into equivalent Chomsky normal form. We do this by replacing, one-by-one, every rule that is not ‘Chomsky’. We have to take care of: Starting Symbol, ε symbol, all other violating rules. 6/9/2014 CSE 2001, Fall 2013 56 Proof of Theorem 2.9 Given a context-free grammar G = (V,Σ,R,S), rewrite it to Chomsky Normal Form by 1) Add a new start symbol S0 (and add rule S0→S) 2) Remove A→ε rules (from the tail): before: B→xAy and A→ε, after: B→ xAy | xy 3) Remove unit rules A→B (by the head): “A→B” and “B→xCy”, becomes “A→xCy” and “B→xCy” 4) Shorten all rules to two: before: “A→B1B2…Bk”, after: A→B1A1, A1→B2A2,…, Ak-2→Bk-1Bk 5) Replace ill-placed terminals “a” by Ta with Ta→a 6/9/2014 CSE 2001, Fall 2013 57 Careful Removing of Rules Do not introduce new rules that you removed earlier. Example: A→A simply disappears When removing A→ε rules, insert all new replacements: B→AaA becomes B→ AaA | aA | Aa | a 6/9/2014 CSE 2001, Fall 2013 58 Example of Chomsky NF Initial grammar: S→ aSb | ε In Chomsky normal form: S0 → ε | TaTb | TaX X → STb S → TaTb | TaX Ta → a Tb → b 6/9/2014 CSE 2001, Fall 2013 59 RL ⊆ CFL Every regular language can be expressed by a context-free grammar. Proof Idea: Given a DFA M = (Q,Σ,δ,q0,F), we construct a corresponding CF grammar GM = (V,Σ,R,S) with V = Q and S = q0 Rules of GM: qi → x δ(qi,x) for all qi∈V and all x∈Σ qi → ε for all qi∈F 6/9/2014 CSE 2001, Fall 2013 60 Example RL ⊆ CFL 0 The DFA 1 1 q1 leads to the context-free grammar GM = (Q,Σ,R,q1) with the rules q1 → 0q1 | 1q2 q2 → 0q3 | 1q2 | ε q3 → 0q2 | 1q2 6/9/2014 CSE 2001, Fall 2013 0 q2 q3 0,1 61 Picture Thus Far ?? context-free languages Regular languages { 0 n1 n } 6/9/2014 CSE 2001, Fall 2013 62 Summary • Every Regular Language can be represented by a regular expression • There are languages that aren’t regular – AnBn isn’t regular – “ww” isn’t regular – Pumping Lemma holds for all regular languages – Using pumping lemma (and closure properties of Regular Languages) you can prove some languages aren’t regular • Context-free languages and grammars (all of 2.1 and some of 2.3) – Productions, derivations, parse trees, closure properties – Ambiguity – Chomsky Normal form • Before next time go over Chapter 2.1,skim 2.3, and especially Chapter 3.1 - there will be a lot to absorb in one session 06/09/2014 63 Exercises for week 5 • Try Sipser, Exercises, 1.21a or b, 1.29b, 1.30 • Problems: 1.31, 1.34, 1.40b?, 1.43, 1.46c or d 06/09/2014 64 Test 1 • Average was 13.4 • Your mark times 4 gives you your grade on York’s grade scale, e.g. pass is 12.5 • Colleagues thought the test was too easy • Next time - we will have longer time, lots more material, some parts will be similar style but also adding some “think” or “apply” questions. • It will cover everything from start of term up to the week before, so you have to really get going learning Chapter 2.1 and 3, as well as make up anything missing from the first two chapters • Do the exercises, go to office hours and tutorials, read the book, don’t get behind 06/09/2014 65 Test Comments • Don’t mark up your test now-keep separate notes. You could get confused about what your original answer was • There were some minor variations between the two versions, so if a question seems out of order, has a’s and b’a instead of 0’s and 1’s, or asks for a different definition, you’ll have to adjust • Check overall addition, etc. • To request a re-grade of a question, write down your reason (along with a statement that you have not modified your test booklet) and hand in to me next week • I have already regraded question 8 extensively 06/09/2014 66 Questions: Version A, page 2 1. The formal definition of acceptance: see Sipser items 1,2,3 page 40 (2nd edition) (This is important to know, analyzing computation as a sequence of steps state by state is central) 1. r0=q0 2. δ(ri,wi+1)=ri+1 for 0≦ i ≦ n-1 3. rn∈F 2.Definition of concatenation or star of A and B: A∘B={xy | x∈A and y∈B} Definition of * of A: A*={ɛ} U A UA∘A U A∘A∘A U ... (no end) 3. i.e. they have an even number of ones : (0*10*10*)* 06/09/2014 67 Test 1,page 3 and 4 4. ba: {q0,q1q2,q3} ab: {q2,q1} abb:{q2,q3} abba:{q1,q0} 5. No, consider the string “a” - it would be accepted by both M and N 6. {w|w is an even length string of a’s and b’s ending in aa} ( I assume b doesn’t need to appear) Two easy ways: 1. draw 2 NFA’s, one for even length strings, one for strings ending in “aa”, so their intersection is regular 2. regular expression (aUb)*aa 06/09/2014 68 Test 1 page 5 7. (aUb)c: 2 points for each part. If you did not use something like the construction asked for, but it’s correct otherwise you get .5/2 There were many incorrect answers for the union, you have to use ɛ transitions from all former accept states 8. Badly done. Ambiguity over “even multiple”, it would have been better for me to say “evenly divisible” not a multiple that is even, and the first meaning is the language that the DFA recognizes. I remarked it twice. Induction is a basic tool. I was looking for something like: Induction Hypothesis: For any i≧0, a string with i a’s and any number of b’s brings M to state q i mod 3 Alternate (not as good): For any n≧0, any string containing 3n a’s (or 6n) and any number of b’s takes M to the (accepting) state q0, and any string containing 3n +1 a’s takes M to (nonaccepting) state q1, and any string containing 3n+2 a’s takes M to (nonaccepting) state q2... 06/09/2014 69 Exercises for week 5 • Chapter 1: • Try Sipser, Exercises, 1.21a or b, 1.29b, 1.30 • Problems: 1.31, 1.34, 1.40b?, 1.43, 1.46c or d • Chapter 2: 2.4 e, 2.6b,2.8, 2.14, 2.16 • Others: 2.15, 2.17, 2.9, 2.14, 2.26 • A pretty hard grammar to write: 2.22 6/9/2014 70