Chapter 6 Context-Free and NonContext-Free Languages 1 Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. The Pumping Lemma for Context-Free Languages • It’s easy to find a language that cannot be accepted by a finite automaton, even if proving it is a little harder – For example, AnBn cannot be accepted by a FA, because with only a finite number of states, we can’t keep track of how many a’s we’ve seen – It might be argued, in a similar way, that neither AnBnCn = {anbncn | n 0} nor XX = {xx | x {a,b}*} can be accepted by a PDA Introduction to Computation 2 The Pumping Lemma for Context-Free Languages (cont’d.) • The way a PDA processes aibjck allows it to confirm that i = j but not to remember that number long enough to compare it to k • One way to prove AnBn is not regular is to use the pumping lemma for regular languages • Now we’ll establish a result for CFLs that is similar to the pumping lemma, but a little more complicated • The basic idea is that a sufficiently long derivation in a grammar G will have to contain a self-embedded variable Introduction to Computation 3 The Pumping Lemma for Context-Free Languages (cont’d.) • For instance, in S* vAz * vwAyz * vwxyz, the string derived from the first occurrence of A also includes an occurrence of A – S * vAz * vwAyz * vwkAykz * vwkxykz must also be a valid derivation, for every k 1, and S * vAz * vxz = vw0xy0z is also valid – This observation will be useful if we can guarantee that the strings w and y are not both null, and even more useful if we can impose some other restrictions on the five strings v, w, x, y, and z of terminals – We do this by requiring that the grammar be in Chomsky normal form (see Chapter 4) Introduction to Computation 4 The Pumping Lemma for Context-Free Languages (cont’d.) • Theorem 6.1: Suppose L is a CFL – Then there is an integer n so that for every u L with |u| n, u can be written as u = vwxyz so that: • |wy| > 0 • |wxy| n • For every m 0, vwmxymz L • Proof: – We can find a CFG G so that L(G) = L - {} and G is in Chomsky normal form, so that the right side of every production is either a single terminal or a string of two variables Introduction to Computation 5 The Pumping Lemma for Context-Free Languages (cont’d.) • Every derivation tree in this grammar is then a binary tree • A binary tree of height h has no more than 2h leaf nodes – Therefore if u L(G) and h is the height of the derivation tree for u, then |u| 2h • Let n be 2p+1 where p is the number of distinct variables in G, and suppose that u is a string in L(G) of length at least n – Then it follows that every derivation tree for u must have height greater than p Introduction to Computation 6 The Pumping Lemma for Context-Free Languages (cont’d.) • Thus, in a derivation tree for u, there must be a path from the root to a leaf node with at least p+1 interior nodes – That path must include the same variable twice; call it A – Let x be the substring of u derived from the lowest occurrence of A in the path, and let w and y be the strings of terminals such that the substring of u derived from the occurrence of A farther from the leaf is wxy – Finally, let v and z be the prefix and suffix of u so that u = vwxyz Introduction to Computation 7 The Pumping Lemma for Context-Free Languages (cont’d.) • The subtree starting at the higher occurrence of A has height at most p+1, thus |wxy| 2p+1=n • The leaf nodes corresponding to the symbols of x are descendants of only one of the two children of the higher occurrence of A • Because G is in Chomsky normal form, the other child also has descendant nodes – Therefore, w and y can’t both be . Finally, we have S * vAz * vwAyz * vwxyz, and we’ve already seen how this establishes the third part of the theorem Introduction to Computation 8 The Pumping Lemma for Context-Free Languages (cont’d.) • Applying the pumping lemma to AnBnCn – Suppose, for the sake of contradiction, that AnBnCn is a context-free language, and let n be the integer in the pumping lemma • Let u be the string anbncn – Then u AnBnCn and |u| n – Therefore, according to the pumping lemma, u=vwxyz for some strings satisfying the three conditions – The first condition, |wy| > 0, implies that the string wy contains at least one symbol Introduction to Computation 9 The Pumping Lemma for Context-Free Languages (cont’d.) • Let u be the string anbncn (cont’d.) – The second, |wxy| n, implies that wxy contains no more than two distinct symbols. – If 1 is one of the three symbols that occurs in wy and 2 is one that doesn’t, then the string vw0xy0z obtained from u by deleting w and y contains fewer than n occurrences of 1 and exactly n occurrences of 2 – This is a contradiction because the third condition implies that vw0xy0z is in AnBnCn and so must have equal numbers of all three symbols Introduction to Computation 10 The Pumping Lemma for Context-Free Languages (cont’d.) • Theorem 6.7, Ogden’s Lemma (a stronger version of the pumping lemma): – Suppose L is a CFL. Then there is an integer n so that for every u L with |u| n, and every choice of n or more “distinguished” positions in the string u, there are strings v, w, x, y, and z so that u = vwxyz and the following conditions are satisfied: • wy contains at least one symbol in a distinguished position • wxy contains n or fewer symbols in distinguished positions • For every m 0, vwmxymz L • Proof: see book Introduction to Computation 11 Intersections and Complements of CFLs • The set of CFLs, like the set of regular languages, is closed under the operations of union, concatenation, and Kleene * • Unlike the set of regular languages, it is not closed under intersection or difference • Consider AnBnCn = {anbncn | n 0} – This set is {aibick | i, k 0} ∩ {aibjcj | i, j 0} – The two simpler languages are CFLs but their intersection is not Introduction to Computation 12 Intersections and Complements of CFLs • We know that XX = {xx | x {a,b}*} is not a CFL – Surprisingly, its complement is • Let L be the complement of XX, i.e., L = {a,b}* - XX – All odd-length strings are in L – If x L and |x| = 2n for some n 1, then for some k with 1 k n, the kth and (n+k)th symbols are different (say a and b, respectively) – There are k -1 symbols before the a, n -1 symbols between them, and n - k symbols after the b Introduction to Computation 13 Intersections and Complements of CFLs (cont’d.) • L = {a,b}* - XX (cont’d.) – Think of the n -1 symbols between the two as k -1 and then n - k symbols. – This means that x is the concatenation of two oddlength strings, one with a in the middle and k - 1 symbols on either side, and one with b in the middle and n - k symbols on either side. – Furthermore, every such string is in L – Let G be the context-free grammar with productions S A | B | AB | BA A EAE | a B EBE | b E a | b Introduction to Computation 14 Intersections and Complements of CFLs (cont’d.) • L = {a,b}* - XX (cont’d.) – The variables A and B generate odd-length strings with middle symbol a and b, respectively, and together generate all odd-length strings – From AB and BA we can derive all the even-length elements of L – Therefore L = L(G), and L is a CFL whose complement (i.e., XX) is not a CFL Introduction to Computation 15 Intersections and Complements of CFLs (cont’d.) • Theorem 6.13: If L1 is a CFL and L2 is a regular language, then L1 ∩ L2 is a CFL. • Proof: Let M1= (Q1, , , q1, Z0, A1, 1) be a PDA accepting L1 and M2 = (Q2, , q2, A2, 2) an FA accepting L2. The intuitive idea is that because the two involve only one stack between them, we can use the same construction involving the Cartesian product as in Theorem 2.15 • Define M=(Q1Q2, , ,(q1,q2), Z0, A1A2, ) as follows: – For , ((p, q), , Z) is the set of pairs ((p’, q’), ) for which (p’,) 1(p, , Z) and 2(q, ) = q’ – ((p, q), , Z) is the set of pairs ((p’, q), for which (p’, ) 1(p, , Z) Introduction to Computation 16 Intersections and Complements of CFLs (cont’d.) • M=(Q1Q2, , , (q1,q2), Z0, A1A2, ) (cont’d.) – This allows M to simulate the computation of M1 because for each move, M consults the state of M1, the input, and the stack – It also allows M to simulate the computation of M2, which requires only the state of M2 and the input • M is nondeterministic if M1 is, but this does not affect the second part of the state-pair – If M1 makes a -transition, so does M, but the second component of the state-pair is unchanged – The stack is used as if it were the stack of M1 Introduction to Computation 17 Intersections and Complements of CFLs (cont’d.) • The rest of the proof depends on the following fact: for every state-pair (p, q), every string of stack symbols, and every integer n 0, these two statements are equivalent: – (q1, yz, Z1) ⊢M1n (p, z, ) and 2*(q2, y) = q – ((q1, q2), yz, Z1) ⊢M n ((p, q), z, ) • Both directions can be proved by a straightforward induction argument • See the book for details Introduction to Computation 18 Intersections and Complements of CFLs (cont’d.) • Thinking about nondeterminism helps to understand how it might happen that no PDA can accept precisely the strings in L’, even if there is a PDA that accepts precisely the strings in L • Example: – A PDA M might be able to choose between two sequences of moves on an input string x, so that both choices read all the symbols of x but only one causes M to end up in an accepting state – In this case, the PDA obtained from M by reversing the accepting and nonaccepting states will still accept x Introduction to Computation 19 Intersections and Complements of CFLs (cont’d.) • Even if M is a deterministic PDA that accepts L, the presence of -transitions might prevent the PDA M’ obtained from M by reversing the accepting and nonaccepting states from accepting L’ • For a DPDA M without -transitions, the machine M’ obtained by reversing the accepting and nonaccepting states of M accepts the complement of L(M) • The complement of an arbitrary language accepted by a DPDA can be accepted by a DPDA, though the proof is not quite as obvious Introduction to Computation 20 Decision Problems Involving ContextFree Languages • The membership problem for CFLs is the decision problem: – Given a CFG G and a string x, is x L(G)? • For regular languages we would have either an FA to start with or a regular expression from which we could obtain one, and the question would be easy to answer: Just run the FA on the string x • Trying to use this approach for a CFL or a PDA would be more complicated, because a PDA may have nondeterminism that cannot be eliminated Introduction to Computation 21 Decision Problems Involving ContextFree Languages (cont’d.) • There is an algorithm to solve the membership problem starting with a CFG G that generates L – If x = just see whether the start variable is nullable – Otherwise, we have an algorithm to find a CFG G1 with no -productions or unit productions so that L(G1) = L(G) - {}, and we can decide whether x L(G1) by trying all possible derivations in G1 with 2|x| - 1 or fewer steps Introduction to Computation 22 Decision Problems Involving ContextFree Languages (cont’d.) • Other interesting decision problems include these: – Given a CFL L, is L nonempty? – Given a CFL L, is L infinite? • We can use the pumping lemma for CFGs to solve these problems, just as we used the pumping lemma for regular languages to solve the corresponding problems for finite automata • We will see that some easy-to-state problems involving CFGs, such as – Given CFGs G1 and G2 is L(G1) ∩ L(G2) nonempty? – Given CFGs G1 and G2 is L(G1) L(G2)? turn out to be undecidable Introduction to Computation 23