Big picture ● All languages ● Decidable Turing machines ● NP ● P ● Context-free Context-free grammars, push-down automata ● Regular Automata, non-deterministic automata, regular expressions Recall: Theorem: L := {0n 1n : n 0} is not regular But it is often needed to recognize this language Example: Programming language syntax have matching brackets, not regular. Next: Introduce context-free languages Why study context-free languages ● ● Practice with more powerful model Programming languages: Syntax of C++, Java, etc. is context-free language (specified by context-free grammar) ● Other reasons: human language has structures that can be modeled as context-free language Example: Context-free grammar G, S = {0,1} S→0S1 S→e Two substitution rules (a.k.a. productions) → Variables = {S}, Terminals = {0,1} Derivation of 0011 in grammar: S 0S1 00S11 0011 L(G) = {0n 1n : n 0} Example: Context-free grammar G, S = {0,1} S→A S→B A→0A1 A→e B→1B0 B→e L(G) = L(A) U L(B) = {0n 1n : n 0} U {1n 0n : n 0} Example: Context-free grammar G, S = {0,1} S→A|B A→0A1|e B→1B0|e Convention: Write A → w|w' for A → w and A → w' Definition: A context-free grammar (CFG) G is a 4 tuple (V, S, R, S) where ● V is a finite set of variables ● S is a finite set of terminals (V S = ) ● R is a finite set of rules, where each rule is A→w ● A V, w (V U S)* S V is the start variable Example The language L = {ambn : m > n} is described by the CFG G = (V, S, R, S) where: V = {S, T} S = {a, b} R = { S → aS | aT T → aTb | e } Derive aaab: S→? Example The language L = {ambn : m > n} is described by the CFG G = (V, S, R, S) where: V = {S, T} S = {a, b} R = { S → aS | aT T → aTb | e } Derive aaab: S → aS →? Example The language L = {ambn : m > n} is described by the CFG G = (V, S, R, S) where: V = {S, T} S = {a, b} R = { S → aS | aT T → aTb | e } Derive aaab: S → aS → aaT →? Example The language L = {ambn : m > n} is described by the CFG G = (V, S, R, S) where: V = {S, T} S = {a, b} R = { S → aS | aT T → aTb | e } Derive aaab: S → aS → aaT → aaaTb →? Example The language L = {ambn : m > n} is described by the CFG G = (V, S, R, S) where: V = {S, T} S = {a, b} R = { S → aS | aT T → aTb | e } Derive aaab: S → aS → aaT → aaaTb → aaab Definition: Let G = (V, S, R, S) be a CFG we write uAv uwv and say uAv yields uwv if A → w is a rule We say u derives v, written u * v, if ● ● u = v, or u1, u2, … , uk k 1 : u u1 u2 … uk = v The language of the grammar is L(G) = {w : S * w} Definition: A language is context-free if CFG G : L(G) = L Example: ∑ = {0,1,#} Give a CFG for L = { x#y : x,y in {0,1}* |x| |y| } G = S → BL S → RB L → BL | A R → RB | A A → BAB | # B → 0 | 1 Remark: B * ? To understand, explain what each piece does! Example: ∑ = {0,1,#} Give a CFG for L = { x#y : x,y in {0,1}* |x| |y| } G = S → BL S → RB L → BL | A R → RB | A A → BAB | # Remark: A * ? B → 0 | 1 Remark: B * 0, B * 1 Example: ∑ = {0,1,#} Give a CFG for L = { x#y : x,y in {0,1}* |x| |y| } G = S → BL S → RB L → BL | A R → RB | A Remark: R * ? A → BAB | # Remark: A * x#y : |x|=|y| B → 0 | 1 Remark: B * 0, B * 1 Example: ∑ = {0,1,#} Give a CFG for L = { x#y : x,y in {0,1}* |x| |y| } G = S → BL S → RB L → BL | A Remark: L * ? R → RB | A Remark: R * x#y : |x| ≤ |y| A → BAB | # Remark: A * x#y : |x|=|y| B → 0 | 1 Remark: B * 0, B * 1 Example: ∑ = {0,1,#} Give a CFG for L = { x#y : x,y in {0,1}* |x| |y| } G = S → BL S → RB Remark: RB * ? L → BL | A Remark: L * x#y : |x| ≥ |y| R → RB | A Remark: R * x#y : |x| ≤ |y| A → BAB | # Remark: A * x#y : |x|=|y| B → 0 | 1 Remark: B * 0, B * 1 Example: ∑ = {0,1,#} Give a CFG for L = { x#y : x,y in {0,1}* |x| |y| } G = S → BL Remark: BL * ? S → RB Remark: RB * x#y : |x| < |y| L → BL | A Remark: L * x#y : |x| ≥ |y| R → RB | A Remark: R * x#y : |x| ≤ |y| A → BAB | # Remark: A * x#y : |x|=|y| B → 0 | 1 Remark: B * 0, B * 1 Example: ∑ = {0,1,#} Give a CFG for L = { x#y : x,y in {0,1}* |x| |y| } G = S → BL Remark: BL * x#y : |x| > |y| S → RB Remark: RB * x#y : |x| < |y| L → BL | A Remark: L * x#y : |x| ≥ |y| R → RB | A Remark: R * x#y : |x| ≤ |y| A → BAB | # Remark: A * x#y : |x|=|y| B → 0 | 1 Remark: B * 0, B * 1 L(G) = L Example: CFG for expressions in programming languages Task: recognize strings like 0 + 0 + 1 x (1 + 0) S → S+S | S x S | ( S ) | 0 | 1 S→ S+S → 0+S → 0+S+S → 0+0+S →0+0+SxS → 0+0+1xS → 0 + 0 + 1 x (S) → 0 + 0 + 1 x (S + S) → 0 + 0 + 1 x (1 + S) → 0 + 0 + 1 x (1 + 0) We have seen: CFG, definition, and examples Next: Ambiguity ● Ambiguity: Some string may have multiple derivations in a CFG ● Ambiguity is a problem for compilers: Compilers use derivation to give meaning to strings. Example: meaning of 1+0x0 ∈ ∑* is its value, 1 ∈ℕ If there are two different derivations, the value may not be well defined. Example: The string 1+0x0 has two derivations in S → S+S | S x S | ( S ) | 0 | 1 One derivation: S → S+S → 1+S → 1+SxS → 1+0xS → 1+0x0 Another derivation: S→? Example: The string 1+0x0 has two derivations in S → S+S | S x S | ( S ) | 0 | 1 One derivation: S → S+S → 1+S → 1+SxS → 1+0xS → 1+0x0 Another derivation: S → SxS → ? Example: The string 1+0x0 has two derivations in S → S+S | S x S | ( S ) | 0 | 1 One derivation: S → S+S → 1+S → 1+SxS → 1+0xS → 1+0x0 Another derivation: S → SxS → Sx0 → ? Example: The string 1+0x0 has two derivations in S → S+S | S x S | ( S ) | 0 | 1 One derivation: S → S+S → 1+S → 1+SxS → 1+0xS → 1+0x0 Another derivation: S → SxS → Sx0 → S+Sx0 → ? Example: The string 1+0x0 has two derivations in S → S+S | S x S | ( S ) | 0 | 1 One derivation: S → S+S → 1+S → 1+SxS → 1+0xS → 1+0x0 Another derivation: S → SxS → Sx0 → S+Sx0 → S+0x0 → ? Example: The string 1+0x0 has two derivations in S → S+S | S x S | ( S ) | 0 | 1 One derivation: S → S+S → 1+S → 1+SxS → 1+0xS → 1+0x0 Another derivation: S → SxS → Sx0 → S+Sx0 → S+0x0 → 1+0x0 We now want to define CFG with no ambiguity Definition: A derivation is leftmost if at every step the leftmost variable is expanded st Example: the 1 previous derivation was leftmost S → S+S → 1+S → 1+SxS → 1+0xS → 1+0x0 Definition: A CFG G is un-ambiguous if no string has two different leftmost derivations. Example The CFG S → S+S | S x S | ( S ) | 0 | 1 is ambiguous because 1+0x0 has two distinct leftmost derivations One leftmost derivation: S → S+S → 1+S → 1+SxS → 1+0xS → 1+0x0 Another leftmost derivation: S → SxS → S+SxS → 1+SxS → 1+0xS → 1+0x0 Example Instead of using CFG S → S+S | S x S | ( S ) | 0 | 1 we may use un-ambiguous grammar S→S+T|T T→TxF|F F → 0 | 1 | (S) Unique leftmost derivation of 1+0x0: S→? Example Instead of using CFG S → S+S | S x S | ( S ) | 0 | 1 we may use un-ambiguous grammar S→S+T|T T→TxF|F F → 0 | 1 | (S) Unique leftmost derivation of 1+0x0: S→S+T→? Example Instead of using CFG S → S+S | S x S | ( S ) | 0 | 1 we may use un-ambiguous grammar S→S+T|T T→TxF|F F → 0 | 1 | (S) Unique leftmost derivation of 1+0x0: S→S+T→T+T→? Example Instead of using CFG S → S+S | S x S | ( S ) | 0 | 1 we may use un-ambiguous grammar S→S+T|T T→TxF|F F → 0 | 1 | (S) Unique leftmost derivation of 1+0x0: S→S+T→T+T→F+T→? Example Instead of using CFG S → S+S | S x S | ( S ) | 0 | 1 we may use un-ambiguous grammar S→S+T|T T→TxF|F F → 0 | 1 | (S) Unique leftmost derivation of 1+0x0: S→S+T→T+T→F+T→1+T →? Example Instead of using CFG S → S+S | S x S | ( S ) | 0 | 1 we may use un-ambiguous grammar S→S+T|T T→TxF|F F → 0 | 1 | (S) Unique leftmost derivation of 1+0x0: S→S+T→T+T→F+T→1+T →1+TxF→? Example Instead of using CFG S → S+S | S x S | ( S ) | 0 | 1 we may use un-ambiguous grammar S→S+T|T T→TxF|F F → 0 | 1 | (S) Unique leftmost derivation of 1+0x0: S→S+T→T+T→F+T→1+T →1+TxF→1+0xF→1+0x0 Actual Java specification grammar snippet Cumbersome but un-ambiguous MultiplicativeExpression: UnaryExpression MultiplicativeExpression * UnaryExpression MultiplicativeExpression / UnaryExpression MultiplicativeExpression % UnaryExpression AdditiveExpression: MultiplicativeExpression AdditiveExpression + MultiplicativeExpression AdditiveExpression - MultiplicativeExpression Next: understand power of context-free languages Study closure under not, U, o, * Recall from regular langues: If A, B are regular then not A is regular ? A U B is regular ? A o B is regular ? A* is regular ? Next: understand power of context-free languages Study closure under not, U, o, * Recall from regular langues: If A, B are regular then not A regular A U B regular A o B regular A* regular Suppose A, B are context-free: A = L(GA) for CFG GA=(VA, S, RA, SA) B = L(GB) for CFG GB=(VB, S, RB, SB) What about AUB AoB A* S→? Suppose A, B are context-free: A = L(GA) for CFG GA=(VA, S, RA, SA) B = L(GB) for CFG GB=(VB, S, RB, SB) What about AUB S → SA|SB AoB A* S→? Context-free Suppose A, B are context-free: A = L(GA) for CFG GA=(VA, S, RA, SA) B = L(GB) for CFG GB=(VB, S, RB, SB) What about AUB S → SA|SB Context-free AoB S → S A SB Context-free A* S→? Suppose A, B are context-free: A = L(GA) for CFG GA=(VA, S, RA, SA) B = L(GB) for CFG GB=(VB, S, RB, SB) What about AUB S → SA|SB Context-free AoB S → S A SB Context-free A* S → SSA | e Context-free Above all context-free! In general, (not A) is NOT context-free Suppose A, B are context-free: A = L(GA) for CFG GA=(VA, S, RA, SA) B = L(GB) for CFG GB=(VB, S, RB, SB) What about AUB S → SA|SB Context-free AoB S → S A SB Context-free A* S → SSA | e Context-free Above also shows regular ⇨context-free Context-free languages contain regular languages Example: Context Free UNION ∑ = {0,1,#} Give a CFG for L = { x#y : x,y in {0,1}* |x| |y| OR x = yR } yR is the reverse of y: 001R = 100 11010R = 01011 1R =1 Example: Context Free UNION ∑ = {0,1,#} Give a CFG for L = { x#y : x,y in {0,1}* |x| |y| OR x = yR } Write L = L1 U L2, where L1 = { x#y : |x| |y| } L2 = { x#y : x = yR } Example: Context Free UNION ∑ = {0,1,#} Give a CFG for L = { x#y : x,y in {0,1}* |x| |y| OR x = yR } Write L = L1 U L2, where L1 = { x#y : |x| |y| } L2 = { x#y : x = yR } G1= S1 → BL | RB L → BL | A Remark: L * x#y : |x| ≥ |y| R → RB | A Remark: R * x#y : |x| ≤ |y| A → BAB | # Remark: A * x#y : |x|=|y| B → 0 | 1 Example: Context Free UNION ∑ = {0,1,#} Give a CFG for L = { x#y : x,y in {0,1}* |x| |y| OR x = yR } Write L = L1 U L2, where L1 = { x#y : |x| |y| } G1= S1 → BL | RB L → BL | A R → RB | A A → BAB | # B → 0 | 1 L2 = { x#y : x = yR } G2= S2 → 0S20 | 1S21 | # Example: Context Free UNION ∑ = {0,1,#} Give a CFG for L = { x#y : x,y in {0,1}* |x| |y| OR x = yR } Write L = L1 U L2, where L1 = { x#y : |x| |y| } G1= S1 → BL | RB L → BL | A R → RB | A A → BAB | # B → 0 | 1 L2 = { x#y : x = yR } G2= S2 → 0S20 | 1S21 | # Let G = S → S1 | S2 Then, L(G1) = L1 & L(G2) = L2 L(G) = L1 U L2 = L Example: Context Free CONCATENATION m m n n Give a CFG for L = { 0 1 0 1 : m even and n odd} Example: Context Free CONCATENATION m m n n Give a CFG for L = { 0 1 0 1 : m even and n odd} Write L = L1 o L2, where L1 = { 0m1m : m even} L2 = { 0n1n : n odd} Example: Context Free CONCATENATION m m n n Give a CFG for L = { 0 1 0 1 : m even and n odd} Write L = L1 o L2, where L1 = { 0m1m : m even} G1= S1 → 00S111 | e L2 = { 0n1n : n odd} Example: Context Free CONCATENATION m m n n Give a CFG for L = { 0 1 0 1 : m even and n odd} Write L = L1 o L2, where L1 = { 0m1m : m even} G1= S1 → 00S111 | e L2 = { 0n1n : n odd} G2= S2 → 00S211 | 01 Example: Context Free CONCATENATION m m n n Give a CFG for L = { 0 1 0 1 : m even and n odd} Write L = L1 o L2, where L1 = { 0m1m : m even} G1= S1 → 00S111 | e L2 = { 0n1n : n odd} G2= S2 → 00S211 | 01 Let G = S → S1S2 Then, L(G1) = L1 & L(G2) = L2 L(G) = L1 o L2 = L Example: Context Free STAR Give a CFG for L = { w in {0,1}* : w = w1 w2 ▪▪▪ wk , k 0 where each wi is a palindrome } ● A string w is a palindrome if w = wR That is, w reads the same forwards and backwards ● Example: 00100, 1001, and 0 are palindromes; 0011, 01 are not Example: Context Free STAR Give a CFG for L = { w in {0,1}* : w = w1 w2 ▪▪▪ wk , k 0 where each wi is a palindrome } Write L = L1*, where L1 = {w : w is a palindrome} Note: In fact, L = {0,1}*, but we will not use that. Example: Context Free STAR Give a CFG for L = { w in {0,1}* : w = w1 w2 ▪▪▪ wk , k 0 where each wi is a palindrome } Write L = L1*, where L1 = {w : w is a palindrome} G1= S1 → 0S10 | 1S11 | 0 | 1 | e Example: Context Free STAR Give a CFG for L = { w in {0,1}* : w = w1 w2 ▪▪▪ wk , k 0 where each wi is a palindrome } Write L = L1*, where L1 = {w : w is a palindrome} G1= S1 → 0S10 | 1S11 | 0 | 1 | e Let G = S → SS1 | e. Then, L(G1) = L1 L(G) = L1* = L. Beyond regular A string and its reversal with C in middle: S → 0S0 | 1S1 | C Example: S ⇨* 0001C1000 More generally, to get strings of the form Ak C Bk use rules: S → A S B | C Example: S = {0,1,#}, wR is reverse of w L = {w # x : wR is a substring of x} Useful to rewrite L as: Example: S = {0,1,#}, wR is reverse of w L = {w # x : wR is a substring of x} = { w#x wR y : w,x,y {0,1}* } G := S → CB C → 0C0 | 1C1 | #B B→0B|1B|e Remark: B * ? Example: S = {0,1,#}, wR is reverse of w L = {w # x : wR is a substring of x} = { w#x wR y : w,x,y {0,1}* } G := S → CB C → 0C0 | 1C1 | #B Remark: C * ? B→0B|1B|e Remark: B * {0,1}* Example: S = {0,1,#}, wR is reverse of w L = {w # x : wR is a substring of x} = { w#x wR y : w,x,y {0,1}* } G := S → CB C → 0C0 | 1C1 | #B Remark: C * w#{0,1}*wR B→0B|1B|e Remark: B * {0,1}* L(G) = L CFG vs. automata CFG non-deterministic pushdown automata (PDA) A PDA is simply an NFA with a stack. q1 x, y → z q2 This means: “read x from the input; pop y off the stack; push z onto the stack” Any of x,y,z may be e. Example: PDA for L = {0n1n : n 0} q0 e, e → $ 0, e → 0 1, 0 → e q1 q2 e, e → e e, $ → e q3 The $ is a special symbol to recognize end of stack Idea: q1 : read and push 0s onto stack until no more q2 : read 1s and match with 0s popped from stack Unlike the case for regular automata, non-deterministic PDA are strictly more powerful than deterministic PDA. Compilers must work with deterministic PDA, an important subclass of context-free languages Non-context-free languages Intuition: If L involves regular expressions and/or nested matchings then probably context-free. If not, probably not. { 0n 1n : n 0 } CF : 000 111 nested {w w : w S*} not CF: 1101 1101 not nested {0n1n2n : n0} not CF: 00 11 22 not nested Non-context-free languages There is a pumping lemma for context-free languages. Similar to the one for regular, but simultaneously “pump” string in two parts: w = u vi x yi z Context-free pumping lemma: L is CF language p 0 w L, |w| p u,v,x,y,z : w= uvxyz, |vy|> 0, |vxy| p i 0 : uvixyiz L Context-free pumping lemma: L is CF language p 0 w L, |w| p u,v,x,y,z : w= uvxyz, |vy|> 0, |vxy| p Proof idea: i 0 : uvixyiz L Let G be CFG : L(G) = L If w L is very long, derivation repeats a variable V (like repeat states in regular P.L.) vxy = piece of w that V derives: V ⇨* vxy Because V repeated once, can repeat it again Context-free pumping lemma: L is CF language p 0 A w L, |w| p u,v,x,y,z : w= uvxyz, |vy|> 0, |vxy| p i 0 : uvixyiz L Useful to prove L NOT context-free. Use contrapositive: L context-free language A same as (not A) L not context-free Context-free pumping lemma (contrapositive) p 0 not A w L, |w| p u,v,x,y,z : L not context-free w = uvxyz, |vy|> 0, |vxy| p i 0 : uvixyiz L To prove L not context-free it is enough to prove not A Not A is the stuff in the box. Context-free pumping lemma (contrapositive) p 0 w L, |w| p u,v,x,y,z : L not context-free w = uvxyz, |vy|> 0, |vxy| p i 0 : uvixyiz L Adversary picks p, You pick w ∈ L of length p, Adversary decomposes w = uvxyz, |vy| > 0, |vxy| p You pick i 0 Finally, you win if uvixyiz L Theorem: L := {an bn cn : n 0} is not context-free. Proof: p 0 Adversary moves p w L, |w| p You move w := ap bp cp u,v,x,y,z : w = uvxyz, Adversary moves u,v,x,y,z |vy|> 0, |vxy| p You move i := 2 i 0 : uvixyiz L You must show uvvxyyz L: vy misses at least one symbol in ∑ = {a,b,c} since ? Theorem: L := {an bn cn : n 0} is not context-free. Proof: p 0 Adversary moves p w L, |w| p You move w := ap bp cp u,v,x,y,z : w = uvxyz, Adversary moves u,v,x,y,z |vy|> 0, |vxy| p You move i := 2 i 0 : uvixyiz L You must show uvvxyyz L: vy misses at least one symbol in ∑ = {a,b,c} since between as and cs there are p bs, and |vy| ≤ p so uvvxyyz ???? Theorem: L := {an bn cn : n 0} is not context-free. Proof: p 0 Adversary moves p w L, |w| p You move w := ap bp cp u,v,x,y,z : w = uvxyz, Adversary moves u,v,x,y,z |vy|> 0, |vxy| p You move i := 2 i 0 : uvixyiz L You must show uvvxyyz L: vy misses at least one symbol in ∑ = {a,b,c} since between as and cs there are p bs, and |vy| ≤ p so uvvxyyz has too few of that symbol, so ∉ L DONE Theorem: L := {ai bj ck : 0 i j k } is not context-free. Proof: Adversary moves p You move w := ap bp cp Adversary moves u,v,x,y,z p 0 w L, |w| p u,v,x,y,z : w = uvxyz, |vy|> 0, |vxy| p i 0 : uvixyiz L So far, same as {an bn cn : n 0}. But now we need a few cases. Our choice of i depends on u,v,x,y,z Theorem: L := {ai bj ck : 0 i j k } is not context-free. Proof (cont.): You have w = apbpcp, with w = uvxyz, |vy|> 0, |vxy| p. You must pick i 0 and show uvixyiz L. If no a's in vy: ? Theorem: L := {ai bj ck : 0 i j k } is not context-free. Proof (cont.): You have w = apbpcp, with w = uvxyz, |vy|> 0, |vxy| p. You must pick i 0 and show uvixyiz L. If no a's in vy: uv xy0z has fewer b's or c's than a's. 0 If no c's in vy: ? Theorem: L := {ai bj ck : 0 i j k } is not context-free. Proof (cont.): You have w = apbpcp, with w = uvxyz, |vy|> 0, |vxy| p. You must pick i 0 and show uvixyiz L. If no a's in vy: uv xy0z has fewer b's or c's than a's. 0 If no c's in vy: uv2xy2z has more a's or b's than c's. If no b's in vy: ? Theorem: L := {ai bj ck : 0 i j k } is not context-free. Proof (cont.): You have w = apbpcp, with w = uvxyz, |vy|> 0, |vxy| p. You must pick i 0 and show uvixyiz L. If no a's in vy: uv xy0z has fewer b's or c's than a's. 0 If no c's in vy: uv2xy2z has more a's or b's than c's. If no b's in vy: You fall in a previous case, since |vxy| p DONE Theorem: L := {s s : s {0,1}* } is not context-free. Proof: p 0 Adversary moves p w L, |w| p You move w := 0p 1p 0p 1p u,v,x,y,z : w = uvxyz, |vy|> 0, |vxy| p i 0 : uvixyiz L Note: To prove L not regular we moved w = 0p 1 0p 1 That move does not work for context-free! Theorem: L := {s s : s {0,1}* } is not context-free. Proof: p 0 Adversary moves p w L, |w| p You move w := 0p 1p 0p 1p u,v,x,y,z : w = uvxyz, Adversary moves u,v,x,y,z |vy|> 0, |vxy| p Three cases: vxy in 1st half of w: ? i 0 : uvixyiz L Theorem: L := {s s : s {0,1}* } is not context-free. Proof: p 0 Adversary moves p w L, |w| p You move w := 0p 1p 0p 1p u,v,x,y,z : w = uvxyz, Adversary moves u,v,x,y,z |vy|> 0, |vxy| p Three cases: i 0 : uvixyiz L vxy in 1st half of w: 2nd half of uv2xy2z starts with 1, but uv2xy2z still starts with 0. vxy in 2nd half of w: ? Theorem: L := {s s : s {0,1}* } is not context-free. Proof: p 0 Adversary moves p w L, |w| p You move w := 0p 1p 0p 1p u,v,x,y,z : w = uvxyz, Adversary moves u,v,x,y,z |vy|> 0, |vxy| p Three cases: i 0 : uvixyiz L vxy in 1st half of w: 2nd half of uv2xy2z starts with 1, but uv2xy2z still starts with 0. vxy in 2nd half of w: 1st half of uv2xy2z ends with 0, but uv2xy2z still ends with 1. vxy touches midpoint: ? Theorem: L := {s s : s {0,1}* } is not context-free. Proof: p 0 Adversary moves p w L, |w| p You move w := 0p 1p 0p 1p u,v,x,y,z : w = uvxyz, Adversary moves u,v,x,y,z |vy|> 0, |vxy| p Three cases: i 0 : uvixyiz L vxy in 1st half of w: 2nd half of uv2xy2z starts with 1, but uv2xy2z still starts with 0. vxy in 2nd half of w: 1st half of uv2xy2z ends with 0, but uv2xy2z still ends with 1. vxy touches midpoint: uv0xy0z = 0p 1i 0j 1p with either i < p or j < p. DONE L := { w {a,b}* : w has same number of a and b} Grammar for L ?? L := { w {a,b}* : w has same number of a and b} Grammar for L S → ε | SS | aSb | bSa Not clear why this works. It requires a proof. Proofs by induction Let P(n) be any claim To prove “∀n ≥ 0, P(n) is true” it suffices to prove Base case: P(0) is true Induction step: ∀ n : ( (∀ i < n, P(i) ) => P(n) ) Induction hypothesis You can replace “0” by any fixed value Example: P(n) = ∑ i=0n i = n(n+1)/2 Claim: ∀ n ≥ 0, P(n) Proof by induction: Base case: P(0) 0 = 0(1)/2 = 0 is true Induction step: ∀ n : ( (∀ i < n, P(i) ) => P(n) ) ∑ i=0n i = ?? Example: P(n) = ∑ i=0n i = n(n+1)/2 Claim: ∀ n ≥ 0, P(n) Proof by induction: Base case: P(0) 0 = 0(1)/2 = 0 is true Induction step: ∀ n : ( (∀ i < n, P(i) ) => P(n) ) ∑ i=0n i = ∑ i=0n-1 i + n = (n-1)n/2 + n = n(n+1)/2 L := { w {a,b}* : w has same number of a and b} S → ε | SS | aSb | bSa Claim: For any w ∈ {a,b}* , S →* w if and only if w ∈ L Proof of “only if”: Suppose S →* w. Must show w ∈ L. This fact is self-evident. We show a proof by induction nevertheless, as a warm-up for the other direction, which is not self-evident. L := { w {a,b}* : w has same number of a and b} S → ε | SS | aSb | bSa Claim: For any w ∈ {a,b}* , S →* w if and only if w ∈ L Proof of “only if”: Suppose S →* w. Must show w ∈ L. Let P(n) = any w ∈ {S,a,b}* such that S →* w in n steps has same number of a and b. Base case (n=1): ?? L := { w {a,b}* : w has same number of a and b} S → ε | SS | aSb | bSa Claim: For any w ∈ {a,b}* , S →* w if and only if w ∈ L Proof of “only if”: Suppose S →* w. Must show w ∈ L. Let P(n) = any w ∈ {S,a,b}* such that S →* w in n steps has same number of a and b. Base case (n=1): ε, SS, aSb, bSa have same number. Induction step: Suppose S →* w' → w where S →* w' in n-1 steps. By induction hypothesis, ?? L := { w {a,b}* : w has same number of a and b} S → ε | SS | aSb | bSa Claim: For any w ∈ {a,b}* , S →* w if and only if w ∈ L Proof of “only if”: Suppose S →* w. Must show w ∈ L. Let P(n) = any w ∈ {S,a,b}* such that S →* w in n steps has same number of a and b. Base case (n=1): ε, SS, aSb, bSa have same number. Induction step: Suppose S →* w' → w where S →* w' in n-1 steps. By induction hypothesis, w' has same number of a, b. Since any rule adds same number of a and b, w has too. L := { w {a,b}* : w has same number of a and b} S → ε | SS | aSb | bSa Claim: For any w ∈ {a,b}* , S →* w if and only if w ∈ L Proof of “if”: Suppose w ∈ L. Must show S →* w Let P(n) = ∀ w ∈ {S,a,b}*, |w| = n, S →* w. Base case: w = ε. Use rule ?? L := { w {a,b}* : w has same number of a and b} S → ε | SS | aSb | bSa Claim: For any w ∈ {a,b}* , S →* w if and only if w ∈ L Proof of “if”: Suppose w ∈ L. Must show S →* w Let P(n) = ∀ w ∈ {S,a,b}*, |w| = n, S →* w. Base case: w = ε. Use rule S → ε Induction step: Let |w| = n. This step is more complicated, and is the “creative step” of this proof. S → ε | SS | aSb | bSa Induction step: Let |w| = n. Define ci := number of a - number of b in w1 w2 … wi c0 = 0 cn = ?? S → ε | SS | aSb | bSa Induction step: Let |w| = n. Define ci := number of a - number of b in w1 w2 … wi c0 = 0 cn = 0 If ∃ 0 < i < n : ci = 0 then w = ?? S → ε | SS | aSb | bSa Induction step: Let |w| = n. Define ci := number of a - number of b in w1 w2 … wi c0 = 0 cn = 0 If ∃ 0 < i < n : ci = 0 then w = w' w'', where w', w'' ∈ L, and |w'| <n, |w''| < n By induction hypothesis. S → * w', S → * w''. Hence S → SS → * w' S → w' w'' = w S → ε | SS | aSb | bSa Induction step: Let |w| = n. Define ci := number of a - number of b in w1 w2 … wi c0 = 0 cn = 0 If ∀ 0 < i < n : ci > 0 then w = ?? S → ε | SS | aSb | bSa Induction step: Let |w| = n. Define ci := number of a - number of b in w1 w2 … wi c0 = 0 cn = 0 If ∀ 0 < i < n : ci > 0 then w = a w' b, where w' ∈ L and |w'| < n By induction hypothesis. S → * w' Hence S → aSb → * a w' b = w S → ε | SS | aSb | bSa Induction step: Let |w| = n. Define ci := number of a - number of b in w1 w2 … wi c0 = 0 cn = 0 If ∀ 0 < i < n : ci < 0 then w = ?? S → ε | SS | aSb | bSa Induction step: Let |w| = n. Define ci := number of a - number of b in w1 w2 … wi c0 = 0 cn = 0 If ∀ 0 < i < n : ci < 0 then w = b w' a, where w' ∈ L and |w'| < n By induction hypothesis. S →* w' Hence S → bSa → * b w' a = w S → ε | SS | aSb | bSa Induction step: Let |w| = n. Define ci := number of a - number of b in w1 w2 … wi c0 = 0 cn = 0 These three cover all cases, because two consecutive ci differ by 1. So the ci cannot change sign without going through 0 DONE