Lecture 16 Pumping Lemma for Context-free Languages COT 4420 Theory of Computation Section 8.1 Statement of the CFL Pumping Lemma Let L be an infinite context-free language. There exists an integer m, such that For every string w ∈ L with |w|> m, w can be decomposed as w= uvxyz such that: 1. |vxy| < m 2. |vy| > 1 3. For all i > 0, uvixyiz ∈ L. Proof of Pumping Lemma Let L-{λ} be an infinite context-free language. We have a grammar G in Chomsky Normal form for this language (does not have unitproductions or λ-productions) In a derivation of a long enough string, since the number of variables is finite, there must be some variable that repeats. Proof of Pumping Lemma • Now let G have k variables, and choose m = 2k and choose a string |w| ≥ m • Claim: The parse tree for w must have k+2 or more levels of nodes. Variables 2k-1 terminals If we have at most k+1 levels in the parse tree of a CNF grammar then the longest yield has length 2k-1. Why? If we forget about the terminals at the leaves, a parse tree in a CNF grammar is a binary tree. Proof of Pumping Lemma • Since w is of length m=2k, it cannot be the yield of any tree that has k+1 or less levels. Therefore, we can conclude that the parse tree for w must have at least k+2 levels. • Thus, there is a path from root to a leaf with at least k+2 nodes. V1 At least K+2 levels V1 = S (root) V2 V3 K+1 Variables Vk+1 σ terminal • Since there are at most k variables in the grammar, some variable is repeated, say A. S • Take A to be the deepest, so that only A is repeated in the subtree u v A A x z y We can write w = uvxyz where u,v,x,y,z are strings of terminals S =>* uAz A =>* vAy A =>* x S u v A A x z y We can write w = uvxyz where u,v,x,y,z are strings of terminals S S =>* uAz A =>* vAy A =>* x u Other possible derivations: S =>* uAz =>* uxz uv0xy0z A x z We can write w = uvxyz where u,v,x,y,z are strings of terminals S =>* uAz A =>* vAy A =>* x Other possible derivations: S =>* uAz =>* uvAyz =>* uvvAyyz =>* uvvxyyz uv2xy2z S u v v A A A x z y y Proof of Pumping Lemma Therefore, knowing that w = uvxyz ∈ L(G) then we also know: uvixyiz ∈ L(G) for all i = 0,1,2,… We can write w = uvxyz where u,v,x,y,z are strings of terminals S Observation1: |vy| ≥ 1 Since G has no unit and λ-productions, v and y cannot both be empty strings. u v A A x z y Observation2: |vxy| ≤ m Since A is the last repeated variable in the green subtree, there is at most k+1 variables in the subtree And therefore, the yield of this subtree is no longer than 2k=m S u v A A x z y Statement of the CFL Pumping Lemma Let L be an infinite context-free language. There exists an integer m, such that For every string w ∈ L with |w|> m, w can be decomposed as w= uvxyz such that: 1. |vxy| < m 2. |vy| > 1 3. For all i > 0, uvixyiz ∈ L. Applications of Pumping Lemma Example 1 { anbncn : n ≥ 0 } Show that the language L = { anbncn : n ≥ 0 } is not context-free. Proof: Using Pumping Lemma, we first assume for contradiction that L is context-free. Since L is infinite and context-free, we can apply the pumping lemma. Example 1 { anbncn : n ≥ 0 } Let m be the critical length of the pumping lemma. Pick a string w in L such that |w| ≥ m. • We pick w = ambmcm • We can write w = uvxyz such that |vxy| ≤ m and |vy| ≥ 1 • Pumping Lemma says: uvixyiz ∈ L Example 1 { anbncn : n ≥ 0 } w = ambmcm w = uvxyz |vxy| ≤ m and |vy| ≥ 1 • We examine all the possible locations of string vxy in w Example 1 { anbncn : n ≥ 0 } w = ambmcm w = uvxyz |vxy| ≤ m and Case 1: vxy is in am Case 2: vxy is in bm Case 3: vxy is in cm Case 4: vxy overlaps am and bm Case 5: vxy overlaps bm and cm |vy| ≥ 1 Example 1 { anbncn : n ≥ 0 } w = ambmcm w = uvxyz |vxy| ≤ m and |vy| ≥ 1 Case 1: vxy is in am In this case the pumped string uv2xy2z will obviously have more a’s than b’s and c’s and therefore will not be in the language. Contradiction!!! Example 1 { anbncn : n ≥ 0 } w = ambmcm w = uvxyz |vxy| ≤ m and |vy| ≥ 1 Case 2: vxy is in bm Similar to case 1, the pumped string uv2xy2z will obviously have more b’s than a’s and c’s and therefore will not be in the language. Contradiction!!! Example 1 { anbncn : n ≥ 0 } w = ambmcm w = uvxyz |vxy| ≤ m and |vy| ≥ 1 Case 3: vxy is in cm Similar to case 1, the pumped string uv2xy2z will obviously have more c’s than a’s and b’s and therefore will not be in the language. Contradiction!!! Example 1 { anbncn : n ≥ 0 } w = ambmcm w = uvxyz |vxy| ≤ m and |vy| ≥ 1 Case 4: vxy overlaps am and bm In this case, uv2xy2z will obviously have m c’s but more than m a’s or b’s and therefore will not be in the language. Let’s look at each sub-case and see what happens … Example 1 { anbncn : n ≥ 0 } w = ambmcm w = uvxyz |vxy| ≤ m and |vy| ≥ 1 Case 4: vxy overlaps am and bm Sub-case 1: v contains only a y contains only b v = ak1 y = bk2 uv2xy2z = am+k1bm+k2cm ∉ L Contradiction!!! Example 1 { anbncn : n ≥ 0 } w = ambmcm w = uvxyz |vxy| ≤ m and |vy| ≥ 1 Case 4: vxy overlaps am and bm Sub-case 2: v contains a and b y contains only b v = ak1bk2 y = bk3 uv2xy2z = ambk2ak1bm+k3cm ∉ L Contradiction!!! Example 1 { anbncn : n ≥ 0 } w = ambmcm w = uvxyz |vxy| ≤ m and Case 4: vxy overlaps am and bm Sub-case 3: v contains only a y contains a and b Similar to sub-case 2 |vy| ≥ 1 Example 1 { anbncn : n ≥ 0 } w = ambmcm w = uvxyz |vxy| ≤ m and Case 5: vxy overlaps bm and cm Similar to case 4 |vy| ≥ 1 In all cases we obtained a contradiction Therefore: the original assumption that L = {a b c : n ≥ 0} n n n is context-free must be wrong Conclusion: L is not context-free