Closure Properties of CFLs If A and B are context free languages then is… • AR a context-free language ? • A* a context-free language ? • A a context-free language (complement)? • A B a context-free language ? • A B a context-free language ? Some of these are true. Some of them are false. CFLs Closed Under Reverse Given a CFL A, is AR a CFL? Since A is a CFL, there is some CFG G that recognizes A Proof-by-construction: There is a CFG GR that recognizes AR. G = (V, Σ, R, S) GR = (V, Σ, RR, S) RR = { A αR | A α R } CFLs Closed Under * Given a CFL A, is A* a CFL? Since A is a CFL, there is some CFG G that recognizes A Proof-by-construction: There is a CFG G* that recognizes A* G = (V, Σ, R, S) G* = (V {S0}, Σ, R*, S0) R* = R { S0 S } { S0 S0S0 } { S0 ε } Closure Properties of CFLs If A and B are context free languages then • AR is a context-free language TRUE • A* is a context-free language TRUE • A is a context-free language (complement)? • A B is a context-free language ? • A B is a context-free language ? CFLs Closed Under Union Given two CFLs A and B is A B a CFL? Proof-by-construction: There is a CFG GAUB that recognizes A B. Since A and B are CFLs, there are CFGs GA = (VA, ΣA, RA, SA) and GB = (VB, ΣB, RB, SB) that generate A and B. GAUB = (VA VB, ΣA ΣB, RAUB, S0) RAUB = RA RB { S0 SA } { S0 SB } Assumes VA and VB are disjoint (easy to arrange this by changing variable names.) CFLs Closed Under Concatenation Given two CFLs A and B is A•B a CFL? Proof-by-construction: There is a CFG GAB that recognizes A•B. Since A and B are CFLs, there are CFGs GA = (VA, ΣA, RA, SA) and GB = (VB, ΣB, RB, SB) that generate A and B. Construct GAB = (VAB,,SAB,RAB): - rename elements of VB so that VA VB = - define VAB = VA VB {SAB}, SAB VA ,VB - define RAB = RA RB {SAB SASB} Closure Properties of CFLs If A and B are context free languages then: • AR is a context-free language TRUE • A* is a context-free language TRUE • A is a context-free language? • A B is a context-free language TRUE • A B is a context-free language ? Non-closure Under Intersection CFLs are not closed under intersection • Example: – L = {anbncn|n 1} is not context-free – L1 = {anbnci |n 1,i 1 }, L2 = {aibncn |n 1,i 1 } are CFLs with corresponding grammars : • L1: S->AB; A->aAb | ab; B->cB | c • L2: S ->AB; A->aA | a; B->bBc | bc – However, L = L1 L2 – Thus the intersection of two CFLs is not a CFL Non-closure Under Intersection Another example • The following language L = {0i1j2k3l | i = k and j = l} is not a CFL – Intuitively, you need a variable and productions like A 0A2 | 02 to generate the matching 0's and 2's, while you need another variable to generate matching 1's and 3's. But these variables would have to generate strings that did not interleave • However, the simpler language {0i1j2k3l | i = k} is a CFL – A grammar: S S3 | A A 0A2 |B B 1B | • Likewise the CFL {0i1j2k3l | j = l} • Their intersection is L Intersection with Regular Languages • Theorem: If L is CFL and R is a regular language, then L R is a CFL Accept/ FA AND PDA Stack Reject Closure Properties of CFLs If A and B are context free languages then: • AR is a context-free language TRUE • A* is a context-free language TRUE • A is a context-free language? • A B is a context-free language TRUE • A B is a context-free language FALSE Closure under Complement? • The complements of some CFLs are also CFLs • Example: {anbn | n ≥ 0} • Complement can be accepted by a PDA: – swap accepting states of PDA that recognizes anbn Non-closure of CFL's Under Complement But not always! The complement of non-CFL L = {0i1j2k3l | i = k and j = l} is a CFL (what is L?) Here is a PDA P recognizing it: • Non-deterministically choose whether to check i k or j l. – Non-deterministic PDA—checks one or the other, but capable of checking either one Say we want to check i k. As long as 0's come in, count them on the stack. Ignore 1's. Pop the stack for each 2. As long as we have not just exposed the bottom-of-stack marker when the first 3 comes in, accept, and keep accepting as long as 3's come in. • But we also have to accept, and keep accepting, as soon as we see that the input is not in L(0*1*2*3*). • • • • • Closure Properties of CFLs If A and B are context free languages then: • AR is a context-free language TRUE • A* is a context-free language TRUE • A is a context-free language MAYBE • A B is a context-free language TRUE • A B is a context-free language MAYBE Closure Properties of CFLs • CFLs closed under reversal, Kleene star, union • CFLs not closed under intersection and complement Using Closure to Prove a Language is not Context-free L={w in {a,b,c}* with equal numbers of as, bs, and cs} • Suppose L is context-free. • Consider L1 = L ∩ a*b*c* • Because context-free languages are closed under intersection with regular languages, L1 must be context free • But L1 is anbncn, which we know not to be context free • So we must have been wrong in our assumption that L is context-free Using Closure to Prove a Language is not Context-free L = {www | w∈{a,b}∗} • Suppose L is context-free • Intersect L with a∗ba∗ba∗b to get L1 = anbanbanb • If L1 is not context-free (can prove it is not with the pumping lemma), L is not context-free either Using Closure to Prove a Language is not Context-free • In general: – Can simplify showing that a language is not context-free by using closure properties – Assume L is context-free – Transform to a simpler language, L’, by using some operation(s) under which CFLs are closed – Use the pumping lemma to show L’ is not context-free, so neither is L Testing Emptiness of a CFL • As for regular languages, we really take a representation of some language and ask whether it represents – In this case, the representation can be a CFG or PDA • Our choice, since there are algorithms to convert one to the other – The test: Use a CFG; check if the start symbol is useless Testing Finiteness of a CFL • Let L be a CFL. Then there is some pumping lemma constant n for L • Test all strings of length between n and 2n - 1 for membership • If there is any such string, it can be pumped, and the language is infinite • If there is no such string, then n - 1 is an upper limit on the length of strings, so the language is finite – Trick: If there were a string z = uvwxy of length 2n or longer, you can find a shorter string uwy in L, but it's at most n shorter (why?). Thus, if there are any strings of length 2n or more, you can repeatedly cut out vx to get, eventually, a string whose length is in the range n to 2n - 1. Testing Membership of a String in a CFL • Simulating a PDA for L on string w doesn't quite work, because the PDA can grow its stack indefinitely on input, and we never finish, even if the PDA is deterministic • There is an O(n3) algorithm (n = length of w) that uses a "dynamic programming" technique. – Called Cocke-Younger-Kasami (CYK) algorithm. CYK Algorithm • Start with a CNF grammar for L • Build a two-dimensional table: – Row = length of a substring of w – Column = beginning position of the substring – Entry in row i and column j = set of variables that generate the substring of w beginning at position j and extending for i positions – These entries are denoted Xj,i+j-1 i.e., the subscripts are the first and last positions of the string represented, so the first row is X11,X22, …,Xnn, the second row is X12,X23, …,Xn-1,n, and so on Table • The horizontal axis corresponds to the positions of the string w = a1a2…an • Table entry Xij is the set of non* that A a a …a terminals A such i i+1 j – We are particularly interested in whether S is in * Sw X1n because that is the same as saying (that is, w is in L) • Basis: (row 1) Xii = the set of variables A such that A a is a production, and a is the symbol at position i of w. – The grammar is in CNF, therefore the only way to derive a terminal is with a production of the form A a, so Xii is the set of non-terminals such that A ai is a production of G • Induction: Suppose we want to compute Xij, which is in row j – i +1 – We can derive aiai+1 … aj from A if there is a production A BC, B derives any prefix of aiai+1 … aj, and C derives the rest. – Thus, we must ask if there is any value of k such that • ik<j • B is in Xik • C is in Xk+1,j Example • We'll use the algorithm to determine if the string w = aabbb is in the language generated by the grammar S AB A BB | a B AB |b • Note that w11 = a, so X11 is the set of all variables that immediately derive a, that is X11 = {A}. Since w22 = a, we also have X22 = {A}, and so on to get X11 = {A}, X22 = {A}, X33 = {B}, X44 = {B}, X55 = {B} S AB A BB | a B AB |b a 1,1 A a 2,2 A b 3,3 B 1,2 2,3 3,4 1,3 2,4 3,5 1,4 2,5 1,5 b 4,4 B 4,5 b 5,5 B • Compute X12 : since X11 = {A} and X22 = {A}, X12 consists of all variables on the left side of a production whose right side is AA. None, so X12 is empty. • Next X23 = {A | A BB, B X22, B X33} so the required right side is AB, thus X23 = {S,B} • Rest is easy: – – – – X12 = , X23 = {S,B}, X34 = {A}, X45 = {A}, X13 = {S,B}, X24 = {A}, X35 = {S,B}, X14 = {A}, X25 = {S,B}, X15 = {S,B} Since S is in X15, w L(G) S AB A BB | a B AB |b 1,1 A 1,2 2,2 A 2,3 S, B 1,3 S, B 1,4 A 2,4 A a a b b b 3,3 B 3,4 A 4,4 B 4,5 5,5 B A A A B A 3,5 S, B B 2,5 S, B B 1,5 S, B S B B Another Example • X aXb | ab • Step 1: put into CNF • Apply CYK algorithm to aaabbb X aXb | ab a a a b b b 1,1 2,2 3,3 4,4 5,5 6,6 1,2 2,3 3,4 4,5 5,6 1,3 2,4 3,5 4,6 1,4 2,5 3,6 1,5 2,6 1,6 S AB | BC A BA | a B CC | b C AB |a Another Example Test for string baaba b a a a 1,1 2,2 3,3 4,4 1,2 2,3 3,4 4,5 1,3 2,4 3,5 1,4 2,5 1,5 a 5,5 CYK as a Parsing Algorithm • Applicability of the CYK algorithm as a parser limited by the computational requirements needed to find a derivation – For an input string of length n, (n2+n)/2 sets need to be constructed to complete the dynamic programming table – Each of these sets may require the consideration of several decompositions of the associated substring Preview of Undecidable CFL Problems • • • • • Is a given CFG ambiguous? Is a given CFG inherently ambiguous? Is the intersection of two CFL’s empty? Are two CFLs the same? Is a given CFL equal to Σ*, where Σ is the alphabet of the language? The Chomsky Hierarchy Turing Machine r Recursively Enumerable Languages Context Sensitive Languages Context Free Languages Linear Bounded Automata Regular Languages Push Down Automata Finite Automata Context-Sensitive Grammars The next grammar type, more powerful than CFGs, is a "somewhat restricted" grammar A grammar is context-sensitive if all productions are of the form x y, where x,y are in (V T)+ and |x| ≤ |y| • Fundamental property: • grammar is non-contracting--i.e., the length of successive sentential forms can never decrease • Why "context-sensitive"? • All productions can be rewritten in a normal form xAy xvy • Effectively, "A can be replaced by v only in the context of a preceding x and a following y" Example • CSG for {anbncn | n ≥ 1} S Ab Ac bB aB abc | aAbc bA Bbcc Bb aa | aaA • Try to derive a3b3c3 S aAbc abAc abBbcc aBbbcc aaAbbcc aabAbcc aabbAcc aabbBbccc aabBbbccc aaBbbbccc aaabbbccc A and B are "messengers"- an A is created on the left, travels to the right to the first c, creates another b and c. Then sends B back to create the corresponding a. Similar to the way one would program a TM to accept the language. Linear-Bounded Automata A limited Turing Machine in which tape use is restricted • Use only part of the tape occupied by the input • I.e., has an unbounded tape, but the amount that can be used is a function of the input • Restrict usable part of tape to exactly the cells taken by the input LBA is assumed to be nondeterministic Relation between CSLs and LBAs If a language L is accepted by some linear bounded automaton, then there is a context-sensitive grammar that generates L • Every step in a derivation from a CSG is a bounded function of |w| because any CSG G is non-contracting