CSCI 3130: Formal Languages and Automata Theory Tutorial 5 Hung Chun Ho Office: SHB 1026 Department of Computer Science & Engineering 1 Agenda • Cocke-Younger-Kasami (CYK) algorithm – Parsing CFG in normal form • Pushdown Automata (PDA) – Design 2 CYK Algorithm Bottom-up Parsing for normal form 3 Cocke-Younger-Kasami Algorithm • Used to parse context-free grammar in Chomsky normal form (or simply normal form) Normal Form Example Every production is of type S AB 1) X YZ A CC | a | c 2) X a B BC | b 3) S ε C CB | BA | c 4 CYK Algorithm - Idea • = Algorithm 2 in Lecture Note (10L8.pdf) • Idea: Bottom Up Parsing • Algorithm: Given a string s of length N For k = 1 to N For every substring of length k Determine what variable(s) can derive it 5 CYK Algorithm - Example • CFG S AB A CC | a | c B BC | b C CB | BA | c • Parse abbc 6 CYK Algorithm – Idea (1) • Idea: We parse the strings in this order: • Length-1 substring abbc abbc abbc abbc 7 CYK Algorithm – Idea (1) • Idea: We parse the strings in this order: • Length-2 substring abbc abbc abbc 8 CYK Algorithm – Idea (1) • Idea: We parse the strings in this order: • Length-3 substring abbc abbc • Length-4 substring abbc • Done! 9 CYK Algorithm – Idea (2) • Idea: Parsing of longer substrings depends on parsing of shorter substrings • Example: abb may be decomposed as – ab + b – a + bb • If we know how to parse ab and b (or, a and bb) then we know how to parse abb 10 CYK Algorithm – Substring • Denote sub(i, j) := substring with start index = i and end index = j • Example: For abbc, sub(2,4) = bbc • This notation is not to complicate things, but just for the sake of convenience in the following discussion… 11 CYK Algorithm – Table • Each cell corresponds to a substring • Store variables deriving the substring Length of Substring Substring of length = 3 Starting with index = 2 i.e., sub(2,3) = bbc a b b Start Index of Substring c 12 CYK Algorithm – Simulation • Base Case : length = 1 – The possible choices of variable(s) can be known by scanning through each production S AB A CC | a | c B BC | b C CB | BA | c A B B A, C a b b c 13 CYK Algorithm – Simulation • Loop : length = 2 – For each substring of length 2 • Decompose into shorter substrings • Check cells below it S AB A CC | a | c ab Let’s parse this substring B BC | b C CB | BA | c A B B A, C a b b c 14 CYK Algorithm – Simulation • For sub(1,2) = ab, it can be decomposed: – ab = a + b = sub(1,1) + sub(2,2) – Possible choices: AB – Scan rules : S S AB A CC | a | c S B BC | b C CB | BA | c A B B A, C a b b c 15 CYK Algorithm – Simulation • For sub(2,3) = bb, it can be decomposed: – bb = b + b = sub(2,2) + sub(3,3) – Possible choices: BB – Scan rules : ∅ No suitable rules are found The CFG cannot parse this substring S AB A CC | a | c B BC | b C CB | BA | c S ∅ A B B A, C a b b c 16 CYK Algorithm – Simulation • For sub(3,4) = bc, it can be decomposed: – bc = b + c = sub(3,3) + sub(4,4) – Possible choices: BA, BC – Scan rules : B, C S AB A CC | a | c B BC | b C CB | BA | c S ∅ B, C A B B A, C a b b c 17 CYK Algorithm – Simulation • For sub(1,3) = abb: – abb = ab + b = sub(1,2) + sub(3,3) – Possible choices: SB – Scan rules : ∅ No suitable variables found yet But, there is another way to decompose the string S AB A CC | a | c B BC | b C CB | BA | c S ∅ B, C A B B A, C a b b c 18 CYK Algorithm – Simulation • For sub(1,3) = abb: – abb = a + bb = sub(1,1) + sub(2,3) – Possible choices: ∅ – Scan rules Cant parse smaller substring Cant parse the string No need to scan rules S AB A CC | a | c B BC | b C CB | BA | c S ∅ B, C A B B A, C a b b c 19 CYK Algorithm – Simulation • For sub(1,3) = abb: – abb = sub(1,1) + sub(2,3) gives no valid parsing – abb = sub(1,2) + sub(3,3) gives no valid parsing • Cannot parse S AB A CC | a | c B BC | b C CB | BA | c ∅ S ∅ B, C A B B A, C a b b c 20 CYK Algorithm – Simulation • For sub(2,4) = bbc: – bbc = sub(2,2) + sub(3,4) Variable: B • Possible choices: BB, BC – bbc = sub(2,3) + sub(4,4) • Possible choices: ∅ S AB A CC | a | c B BC | b C CB | BA | c ∅ B S ∅ B, C A B B A, C a b b c 21 CYK Algorithm – Simulation • Finally, for sub(1,4) = abbc: – Possible choices: This cell represents the original string, and it consists S abbc is in the language • AB , SB, SC – Variables: •S S AB A CC | a | c B BC | b C CB | BA | c ∅ B S ∅ B, C A B B A, C a b b c 22 CYK Algorithm – Parse Tree • abbc is in the language! • How to obtain the parse tree? – Tracing back the derivations: • sub(1,4) is derived using SAB from sub(1,1) and sub(2,4) • sub(1,1) is derived using Aa • sub(2,4) is derived using BBC from sub(2,2) and sub(3,4) •… • So, record also the used derivations! 23 CYK Algorithm – Parse Tree • Obtained from the table S ∅ B S ∅ B, C A B B A, C a b b c 24 CYK Algorithm – Conclusion • A bottom up parsing algorithm – Dynamic Programming – Solution of a subproblem (parsing of a substring) depends on that of smaller subproblems • Before employing CYK Algorithm, convert the grammar into normal form – Remove ε-productions – Remove unit-productions 25 CYK Algorithm – Detailed D = “On input w = w1w2…wn: If w = ε, and S ε is rule, Accept For i = 1 to n: For each variable A: Test whether A b is a rule, where b = wi. If so, place A in table(i, i). For l = 2 to n: For i = 1 to n – l + 1: Let j = i + l – 1, For k = i to j – 1: For each rule A BC: If table(i,k) contains B and table(k+1, j) contains C Put A in table(i, j) If S is in table (1,n), accept. Otherwise, reject.” 26 Pushdown Automata NFA with infinite memory/states 27 Pushdown Automata • PDA ~= NFA, with a stack of memory • Transition: – NFA – Depends on input – PDA – Depends on input and top of stack (possibly ε) • Push a symbol to stack (possibly ε) • Pop a symbol to stack • Read a terminal on string (possibly ε) • Transitions are non-deterministic 28 Pushdown Automata and NFA • Accept: – NFA – Go to an Accept state – PDA – Go to an Accept state 29 PDA – Example 1 • Given the following language: L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1} • Design a PDA for it 30 PDA – Example 1 - Idea • Idea: The input has two sections – First half • All ‘0’s – Second half • All ‘1’s • #‘1 depends on #‘0’ – #‘0’ ≤ #‘1’ ≤ #‘0’ × 2 31 PDA – Example 1 – Solution • Solution: 1,X/e 0,e/X e,e/$ e,e/e q1 e,$/e q0 1,X/X 1,X/e q3 q2 L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1} 32 PDA – Example 1 – Explain • Solution: 1,X/e 0,e/X e,e/$ e,e/e q1 e,$/e q0 1,X/X q3 1,X/e q2 • Let’s try some string… w = 00111 – See white board for simulation… L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1} 33 PDA – Example 1 – Explain • Solution: 1,X/e 0,e/X e,e/$ e,e/e q1 e,$/e q0 1,X/X 1,X/e q3 q2 • Indicates the start of parsing L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1} 34 PDA – Example 1 – Explain • Solution: 1,X/e 0,e/X e,e/$ e,e/e q1 e,$/e q0 1,X/X 1,X/e q3 q2 • This part saves information about #‘0’ • # ‘X’ in stack = #‘0’ L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1} 35 PDA – Example 1 – Explain • Solution: 1,X/e 0,e/X e,e/$ e,e/e q1 e,$/e q0 1,X/X 1,X/e q3 q2 • This part accounts for #‘1’ – #‘0’ ≤ #‘1’ ≤ #‘0’ × 2 L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1} 36 PDA – Example 1 – Explain • Solution: 1,X/e 0,e/X e,e/$ e,e/e q1 e,$/e q0 1,X/X 1,X/e q3 q2 • Consume one ‘X’ and eats one ‘1’ L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1} 37 PDA – Example 1 – Explain • Solution: 1,X/e 0,e/X e,e/$ e,e/e q1 e,$/e q0 1,X/X 1,X/e q3 q2 • Consume one ‘X’ and eats two ‘1’ L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1} 38 PDA – Example 1 – Explain • Solution: 1,X/e 0,e/X e,e/$ e,e/e q1 e,$/e q0 1,X/X 1,X/e q3 q2 • Consume one ‘X’, and then – eats one ‘1’, or – eat two ‘1’ L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1} 39 PDA – Example 1 – Explain • Solution: 1,X/e 0,e/X e,e/$ e,e/e q1 e,$/e q0 1,X/X 1,X/e q3 q2 • Indicates the end of parsing L = {0i1j: i ≤ j ≤ 2i, i=0,1,…}, S = {0, 1} 40 PDA – Example 2 • Given the following language: L = { aibjckdl: i, j, k, l=0,1,…; i+k=j+l }, where the alphabet Σ= {a, b, c, d} • Design a PDA for it 41 PDA – Example 2 – Idea • Idea: – Sequentially read (multiple) ‘a’, ‘b’, ‘c’ and ‘d’ – Maintain: • #‘a’ + #‘c’ • #‘b’ + #‘d’ – If these numbers equal • Accept 42 PDA – Example 2 – Solution • Solution: b,X/e a,e/X e,e/$ q 1 c,$/$X c,X/XX e,e/e q2 b,$/$Y b,Y/YY e,e/e q3 c,Y/e e,e/e d,X/e q4 e, $ /e q5 d,$/$Y d,Y/YY L = { aibjckdl: i, j, k, l=0,1,…; i+k=j+l }, where the alphabet Σ= {a, b, c, d} 43 PDA – Example 2 – Explain • Solution: b,X/e a,e/X e,e/$ q 1 c,$/$X c,X/XX e,e/e q2 b,$/$Y e,e/e q3 c,Y/e b,Y/YY start a b e,e/e d,X/e q4 e, $ /e q5 d,$/$Y d,Y/YY c d end L = { aibjckdl: i, j, k, l=0,1,…; i+k=j+l }, where the alphabet Σ= {a, b, c, d} 44 PDA – Example 2 – Explain • Solution: b,X/e a,e/X e,e/$ q 1 c,$/$X c,X/XX e,e/e q2 b,$/$Y b,Y/YY e,e/e q3 c,Y/e d,X/e e,e/e q4 e, $ /e q5 d,$/$Y d,Y/YY • Each X in stack = An extra a or c L = { aibjckdl: i, j, k, l=0,1,…; i+k=j+l }, where the alphabet Σ= {a, b, c, d} 45 PDA – Example 2 – Explain • Solution: b,X/e a,e/X e,e/$ q 1 c,$/$X c,X/XX e,e/e q2 b,$/$Y b,Y/YY e,e/e q3 c,Y/e d,X/e e,e/e q4 e, $ /e q5 d,$/$Y d,Y/YY • Each Y in stack = An extra b or d L = { aibjckdl: i, j, k, l=0,1,…; i+k=j+l }, where the alphabet Σ= {a, b, c, d} 46 PDA – Example 2 – Explain • Solution: b,X/e a,e/X e,e/$ q 1 c,$/$X c,X/XX e,e/e q2 b,$/$Y b,Y/YY e,e/e q3 c,Y/e e,e/e d,X/e q4 e, $ /e q5 d,$/$Y d,Y/YY • X and Y ‘cancel’ each other • The stack contains only X’s or only Y’s L = { aibjckdl: i, j, k, l=0,1,…; i+k=j+l }, where the alphabet Σ= {a, b, c, d} 47 PDA – Example 2 – Explain • Solution: b,X/e a,e/X e,e/$ q 1 c,$/$X c,X/XX e,e/e q2 b,$/$Y e,e/e q3 c,Y/e b,Y/YY e,e/e d,X/e q4 e, $ /e q5 d,$/$Y d,Y/YY • No X’s and no Y’s means – #a + #c = #b + #d Accept L = { aibjckdl: i, j, k, l=0,1,…; i+k=j+l }, where the alphabet Σ= {a, b, c, d} 48