(American Heritage Dict.) Parse: v. To break (a sentence) down into its component parts of speech with an explanation of the form, function, and syntactical relationship of each part. the dog loves the cat × × 4/7/2015 the loves dog the cat the cat the dog loves IT 327 1 The most practical Parsers: Predictive parser: No back tracking. 1. input (token string) 2. Stacks, parsing table 3. output (syntax tree, intermediate codes) 4/7/2015 IT 327 2 Tow kinds of predictive parsers: Top-Down The syntax tree is built up from the root Example: LL(1) parser Left to right scanning Leftmost derivations 1 symbol look-ahead Bottom-Up: The syntax tree is built up from the leaves Example: LR(1) parser Left to right scanning Rightmost derivations 1 symbol look-ahead 4/7/2015 IT 327 3 end-of-file symbol A left-most derivation 1. 2. 3. 4. 5. S ASb S C Aa CcC C S ASb aSb aASbb aaSbb aaASbbb aaaSbbb aaaCbbb aaacCbbb aaaccCbbb aaaccbbb a b c $ S 1 2 2 2 A 3 5 4 5 C LL(1) Parsing Table LL(1) Grammar aaaccbbb 4/7/2015 IT 327 4 Recursive-descent Parser all possible terminal and end-of-file symbols S(): Switch(token) { case a: A();S();get(b); build S ASb; break; case b: C(); build S C; break; case c: C(); built S C; break; case $: C(); built S C; break; } 1. 2. 3. 4. 5. S ASb S C Aa CcC C a b c $ S 1 2 2 2 A 3 5 4 5 C LL(1) Parsing Table 4/7/2015 IT 327 5 Recursive-descent Parser A(): Switch(token) { case a: get(a); build A a; break; case b: error; break; case c: error; break; case $: error; break; } 1. 2. 3. 4. 5. S ASb S C Aa CcC C a b c $ S 1 2 2 2 A 3 5 4 5 C LL(1) Parsing Table 4/7/2015 IT 327 6 Recursive-descent Parser C(): Switch(token) { case a: error; break; case b: build C ; break; case c: get(c);C(); built C cC; break; case $: build C ; break; } 1. 2. 3. 4. 5. S ASb S C Aa CcC C a b c $ S 1 2 2 2 A 3 5 4 5 C LL(1) Parsing Table 4/7/2015 IT 327 7 LL(1) Parsing a b c $ S 1 2 2 2 A 3 C 5 4 5 aaaccbbb S(); A();S();get(b); get(a);S();get(b); aaaccbbb S();get(b); A();S();get(b);get(b); get(a);S();get(b);get(b); aaaccbbb S();get(b);get(b); A();S();get(b);get(b);get(b); get(a);S();get(b);get(b);get(b); aaaccbbb S();get(b);get(b);get(b); C();get(b);get(b);get(b); get(c);C();get(b);get(b);get(b); aaaccbbb C();get(b);get(b);get(b); get(c);C();get(b);get(b);get(b); aaaccbbb C();get(b);get(b);get(b); get(b);get(b);get(b); aaaccbbb get(b);get(b); aaaccbbb get(b); IT 327 aaaccbbb 1. 2. 3. 4. 5. S ASb S C Aa CcC C S ASb aSb aASbb aaSbb aaASbbb aaaSbbb aaaCbbb aaacCbbb aaaccCbbb aaaccbbb 8 LL(1) Grammar A grammar having an LL(1) parsing table. i.e., There is no conflict in the parsing table 1. 2. 3. 4. 5. S ASb S C Aa CcC C a b c $ S 1 2 2 2 A 3 5 4 5 C LL(1) Parsing Table LL(1) Grammars allow -production. 4/7/2015 IT 327 9 Not every CFG is an LL(1) grammar <stmt> ::= <if-stmt> | s1 | s2 <if-stmt> ::= if <expr> then <stmt> else <stmt> | if <expr> then <stmt> <expr> ::= e1 | e2 if e1 then if e2 then s1 else s2 if (a > 2) if (b > 1) b++; else a++; 4/7/2015 if (a > 2) if (b > 1) b++; else a++; IT 327 10 The recursive-descent parser does not work for every CFG E(): Switch(token) { case id: E(); ... ... ... } 1. 2. 3. 4. 5. 6. E E T T F F E+T T T*F F (E) id id+id*id Left-recursions 4/7/2015 IT 327 11 A left-recursive grammar 1. A A 2. A Left-recursions 4/7/2015 A’ A 1. A A’ 2. A’ A’ 3. A’ A A A A Remove left-recursion A’ A’ A’ IT 327 12 Eliminating left-recursions 1. 2. 3. 4. 5. 6. 4/7/2015 E E T T F F E+T T T*F F (E) id 1. 2. 3. 4. 5. 6. 7. 8. IT 327 E T E’ E’ + T E’ E’ T F T’ T’ * F T’ T’ F (E) F id 13 An Algorithm for Eliminating immediate left-recursions Given a CFG G, let A be one of its non-terminal symbols such that A A 1. Add a new non-terminal symbol A’ to G; 2. For each production A 3. For each production A 4. A such that A is not the 1st symbol in add A A’ to G; A replace it by A A’; Add A’ 4/7/2015 to G; IT 327 1. A A’ 2. A’ A’ 3. A’ 14 Indirect left-recursions S 1. 2. 3. 4. S S A A Aa b Sd e d S a A S d b 4/7/2015 a A bdada IT 327 15 Indirect left-recursions find all immediate left recursions repeat if any, remove the last non-terminal symbol Z with rule ZX… find all immediate left recursions 1. 2. 3. 4. 5. S S A A A 1. S A a Aa 2. S b b 3. A SdA’ Ac 4. A eA’ Sd 5. A’ cA’ e 6. A’ 1. 2. 3. 4. 5. S SdA’ a S eA’a S b A’ cA’ A’ 1. 2. 3. 4. 5. 6. S eA’aS’ S bS’ S’ dA’aS’ S’ A’ cA’ A’ A A’ A A A’ A’ A A’ 4/7/2015 IT 327 16 An Algorithm for Eliminating left-recursions Given a CFG G, let A1, A2, ..... An, be its nonterminal symbols for i:= n down to 1 do { for j := 1 to i-1 do { For each production // find one level of indiretion Ai Aj ω do { Aj , add Ai ω to the grammar; Remove Ai Aj ω by For each production } } // end for j Eliminate the immediate left-recursion caused by } // end for i 4/7/2015 IT 327 Ai 17 A Grammar for if statements 1. 2. 3. 4. 5. S S E E C iCtSE a eS b a S e 2 i t $ 1 E 3,4 C 4 5 Is it an LL(1) grammar? Is there an LL(1) parsing table for it? 4/7/2015 b IT 327 No! 18 ibtibtae…… A Grammar for if statements 1. 2. 3. 4. 5. S S E E C S i ... i i 4: i i 4/7/2015 iCtSE a eS b a S 2 E C ... b t S E… b b b b t t t t ibtSE ibtaE ibta ibta E… E… E… eS… b e i t $ 1 3,4 4 Why there is a conflict? 5 S i ... i i 3: i IT 327 ... b t S E… b t ibtSE E… b t ibtaE E… b t ibtaeS E… 19 A Grammar for if statements a 1. 2. 3. 4. 5. S S E E C iCtSE a eS b S b e 2 i t 1 E 3,4 C $ 4 5 Can we have an unambiguous equivalent grammar for this grammar? Yes! In general, No! Some inherently ambiguous languages exist. Can we write a program to test whether a given grammar is ambiguous? No! Can we write a program to get an unambiguous equivalent grammar from any grammar of a language that is known to be not inherently ambiguous? 4/7/2015 IT 327 No! 20 Is there an LL(2) Grammar ? Yes! We need to look two symbols ahead in order to determine which rule should be used. { ambnc | m ≥ 1 and n ≥ 0 } 1. 2. 3. 4. 5. S A A B B AB aA a bB c S A a b c 2 3 3 B b c 4 5 LL(2) Parsing Table a a a a a b b b b c 4/7/2015 a 1 IT 327 21 LL(2) Parsing Table 1. 2. 3. 4. 5. S A A B B AB aA a bB c S(); A();B(); get(a);A();B(); A();B(); get(a);A();B(); A();B(); get(a);B() B(); get(b);B(); B(); get(c); 4/7/2015 a LL(2) Parsing a a a b a a a b a a a b a a b a a b a b a b b b IT 327 S c c c c c c c c c c c b c 1 A ab c 23 3 B 4 5 22 Is there an LL(1) grammar equivalent to the following LL(2) grammar? Yes { ambnc | m ≥ 1 and n ≥ 0 } 1. 2. 3. 4. 5. S A A B B 1. 2. 3. 4. 5. AB aA a bB c S A A B B aAB aA bB c a a a a a b b b b c 4/7/2015 IT 327 23 Every left-recursive grammar is not an LL(k) grammar But we can effectively find an equivalent one 1. 2. 3. 4. 1. S S A 2. S a 3. A b S SA SAA SAAA SAAAA aAAAAA .... abbbbbb 4/7/2015 1. 2. 3. 4. 5. 6. E E T T F F S S’ S’ A a S’ AS’ b 1. E T E’ 2. E’ + T E’ E+T 3. E’ T 4. T F T’ T*F 5. T’ * F T’ F 6. T’ (E) 7. F ( E ) id 8. F id IT 327 Are we happy with this? 24 Does any LL(2) grammar always has an equivalent LL(1) grammar? No LL(k) grammar, k 2 LL(2) grammar 1. 2. 3. 4. S S A A 1. 2. 3. 4. aSA abS c S S A A aSA ak-1bS c no equivalent LL(k-1) grammar KuriKi-Sunoi [1969] no equivalent LL(1) grammar LL(1) LL(2) LL(3) ..... LL(k) LL(k+1) ... 4/7/2015 IT 327 25 LL(k) grammar, k 2 1. 2. 3. 4. S S A A aSA ak-1bS c (KuriKi-Sunoi [1969]) This grammar is inherently ambiguous. Is there an unambiguous CFG that is not an LL(k) grammar? Yes There exists DCFL that is not LL(k) -- Stearns [1970] { an | n ≥ 0 } { anbn | n ≥ 0 } 4/7/2015 IT 327 26 LL(1) Parser Implementation 1. 2. 3. 4. 5. 6. 7. 8. E T E’ E’ + T E’ E’ T F T’ T’ * F T’ T’ F (E) F n n E ( 4 $ 3 3 6 6 4 6 8 ) 1 2 T’ F * 1 E’ T + 5 7 p.s. Let n be any positive integer less than 32767 Programming Assignment Details will be announced later. 4/7/2015 IT 327 27