Discussion #5 LL(1) Grammars &Table-Driven Parsing Discussion #5 1/18 Topics • Approaches to Parsing – Full backtracking – Deterministic • Simple LL(1), table-driven parsing • Improvements to simple LL(1) grammars Discussion #5 2/18 Prefix Expression Grammar • Consider the following grammar (which yields prefix expressions for binary operators): E N | OEE O+||*|/ N0|1|2|3|4 • Here, prefix expressions associate an operator with the next two operands. *+234 (* (+ 2 3) 4) (2 + 3) * 4 = 20 Discussion #5 *2+34 (* 2 (+ 3 4)) 2 * (3 + 4) = 14 3/18 Top-Down Parsing with Backtracking *+342 N E O E E … + * N O E E … + N E N | OEE O+||*|/ N0|1|2|3|4 Discussion #5 N N 0 1 2 0 1 2 3 0 1 2 3 4 4/18 What are the obvious problems? • We never know what production to try. • It appears to be terribly inefficient—and it is. • Are there grammars for which we can always know what rule to choose? Yes! • Characteristics: – Only single symbol look ahead – Given a non-terminal and a current symbol, we always know which production rule to apply Discussion #5 5/18 LL(1) Parsers • An LL parser parses the input from Left to right, and constructs a Leftmost derivation of the sentence. • An LL(k) parser uses k tokens of look-ahead. • LL(1) parsers, although fairly restrictive, are attractive because they only need to look at the current non-terminal and the next token to make their parsing decisions. • LL(1) parsers require LL(1) grammars. Discussion #5 6/18 Simple LL(1) Grammars For simple LL(1) grammars all rules have the form A a11 | a22 | … | ann where • ai is a terminal, 1 <= i <= n • ai aj for i j and • i is a sequence of terminals and non-terminal or is empty, 1 <= i <= n Discussion #5 7/18 Creating Simple LL(1) Grammars • Why is this not a • By making all production simple LL(1) grammar? rules of the form: A a11 | a22 | … | ann E N | OEE • Thus, O+||*|/ N0|1|2|3|4 E 0 | 1 | 2 | 3 | 4 | +EE | EE | *EE | /EE • How can we change it to simple LL(1)? Discussion #5 8/18 Example: LL(1) Parsing E (1)0 | (2)1 | (3)2 | (4)3 | (5)4 | (6)+EE | (7)EE | (8)*EE | (9)/EE 2*3 *+234 E E 8 7 * E E 6 5 + E 3 E 4 E E 3 2 8 * 4 2 3 Success! E E 4 3 ? Fail! Output = 8 6 3 4 5 Discussion #5 9/18 Simple LL(1) Parse Table A parse table is defined as follows: (V {#}) (VT {#}) {(, i), pop, accept, error} where – is the right side of production number i – # marks the end of the input string (# V) If A (V {#}) is the symbol on top of the stack and a (VT {#}) is the current input symbol, then: ACTION(A, a) = pop if A = a for a VT accept if A = # and a = # (a, i) which means “pop, then push a and output i” (A a is the ith production) error otherwise Discussion #5 10/18 Parse Table E (1)0 | (2)1 | (3)2 | (4)3 | (5)+EE | (6)*EE VT {#} 0 1 2 3 E (0,1) (1,2) (2,3) (3,4) 0 pop 1 V{#} + * # (+EE,5) (*EE,6) pop 2 3 + * # pop pop pop pop accept All blank entries are error Discussion #5 11/18 0 1 2 3 + * E (0,1) (1,2) (2,3) (3,4) (+EE,5) (*EE,6) 0,1,2,3,+,* pop pop pop pop pop pop # accept Action Stack Input Initialize ACTION(E,*) = Replace [E,*EE], Out 6 ACTION(*,*) = pop(*,*) ACTION(E,+) = Replace [E,+EE], Out 5 ACTION(+,+) = pop(+,+) ACTION(E,1) = Replace [E,1], Out 2 ACTION(1,1) = pop(1,1) ACTION(E,2) = Replace [E,2], Out 3 ACTION(2,2) = pop(2,2) ACTION(E,3) = Replace [E,3], Out 4 ACTION(3,3) = pop(3,3) ACTION(#,#) = accept E# *EE# EE# +EEE# EEE# 1EE# EE# 2E# E# 3# # *+123# Discussion #5 # *+123# *+123# *+123# *+123# *+123# *+123# *+123# *+123# *+123# *+123# Output 6 6 65 65 652 652 6523 6523 65234 65234 Done! 12/18 Simple LL(1): More Restrictive than Necessary • Simple LL(1) grammars are very easy and efficient to parse but also very restrictive. • The good news: we can achieve the same desirable results without being so restrictive. • How? We only need to retain the restriction that single-symbol look ahead uniquely determines which rule to use. Discussion #5 13/18 Relaxing Simple LL(1) Restrictions • Consider the following grammar, which is not simple LL(1): E (1)N | (2)OEE O (3)+ | (4)* N (5)0 | (6)1 | (7)2 | (8)3 • What are the problem rules? (1) & (2) • Observe that it is possible distinguish between rules 1 and 2. – – – – N leads to {0, 1, 2, 3} O leads to {+, *} {0, 1, 2, 3} {+, *} = Thus, if we see 0, 1, 2, or 3 we choose (1), and if we see + or *, we choose (2). Discussion #5 14/18 LL(1) Grammars • FIRST() = { | * and VT} • A grammar is LL(1) if for all rules of the form A 1 | 2 | … | n the sets FIRST(1), FIRST(2), …, and FIRST(n) are pair-wise disjoint; that is, FIRST(i) FIRST(j) = for i j Discussion #5 15/18 E (1)N | (2)OEE O (3)+ | (4)* N (5)0 | (6)1 | (7)2 | (8)3 For (A, a), we select (, i) if a FIRST() and is the right hand side of rule i. VT {#} V{#} E O N + * 0 1 2 3 # Discussion #5 + (OEE,2) (+,3) * (OEE,2) (*,4) 0 (N,1) 1 (N,1) 2 (N,1) 3 (N,1) (0,5) (1,6) (2,7) (3,8) # pop pop pop pop pop pop accept 16/18 + * 0 1 2 3 E (OEE,2) (OEE,2) (N,1) (N,1) (N,1) (N,1) O (+,3) (*,4) (0,5) (1,6) (2,7) (3,8) pop pop pop pop N +,*,0,1,2,3 pop pop # accept Action Stack Input Initialize ACTION(E,*) = Replace [E,OEE], Out 2 ACTION(O,*) = Replace [O,*], Out 4 ACTION(*,*) = pop(*,*) ACTION(E,+) = Replace [E,OEE], Out 2 ACTION(O,+) = Replace [O,+], Out 3 ACTION(+,+) = pop(+,+) ACTION(E,1) = Replace [E,N], Out 1 ACTION(N,1) = Replace [N,1], Out 6 ACTION(1,1) = pop(1,1) ACTION(E,2) = Replace [E,N], Out 1 ACTION(N,2) = Replace [N,2], Out 7 ACTION(2,2) = pop(2,2) ACTION(E,3) = Replace [E,N], Out 1 ACTION(N,3) = Replace [N,3], Out 8 ACTION(3,3) = pop(3,3) ACTION(#,#) = accept E# OEE# *EE# EE# OEEE# +EEE# EEE# NEE# 1EE# EE# NE# 2E# E# N# 3# # *+123# Discussion #5 # *+123# *+123# *+123# *+123# *+123# *+123# *+123# *+123# *+123# *+123# *+123# *+123# *+123# *+123# *+123# Output 2 24 24 242 2423 2423 24231 242316 242316 2423161 24231617 24231617 242316171 2423161718 2423161718 Done! 17/18 What does 2 4 2 3 1 6 1 7 1 8 mean? E (1)N | (2)OEE O (3)+ | (4)* N (5)0 | (6)1 | (7)2 | (8)3 E (4) * (3) + (2) OEE (2) OEE (1) N (6) 1 (1) (1) N (7) N (8) 3 2 2 4 2 3 1 6 1 7 1 8 defines a parse tree via a preorder traversal. Discussion #5 18/18