• Review: – How do we define a grammar (what are the components in a grammar)? – What is a context free grammar? – What is the language defined by a grammar? – What is an ambiguous grammar? – Why we care about left or right derivation? • Example: <PROGRAM> ->’program’ id ‘begin’ <stmt_list> ‘end’ <STMT_LIST> -> <STMT> ‘;’<STMT_LIST> | <STMT> <STMT> -> id ‘=‘ <EXPR> <EXPR> -><EXPR> <OP> <EXPR> | id <OP> -> ‘+’ | ‘-’ | ‘*’ | ‘/’ program test begin t0 = t1 + t2; t3 = t0 * t4 end program test begin * t0 = t1+t2; <PROGRAM> ==> t3 = t0*t4 end • Parsing: – The process to determine whether the start symbol can derive the program. • If successful, the program is a valid program. • If failed, the program is invalid. – Two approaches in general. • Expanding from the start symbol to the whole program (top down) • Reduction from the whole program to start symbol (bottom up). • Parsing methods: – universal: • There exists algorithms that can parse any context free grammar. These algorithms are too inefficient to be used anywhere. • What is considered efficient? Scan the program (from left to right) once. – Top-down parsing • build the parse tree from root to leave (using leftmost derivation, why?). • Recursive descent, and LL parser – Bottom-up parsing • build the parse tree from leaves to root. • Operator precedence parsing, LR (SLR, canonical LR, LALR). – Recursive descent parsing associates a procedure with each nonterminal in the grammar, it may require backtracking of the input string. – Example: <type>-><simple> | ^ id | array [<sample>] of <type> <simple> ->integer | char | num dotdot num void type() { if (lookahead == INTEGER || lookahead == CHAR || lookahead==NUM) simple(); else if (lookahead == ‘^’) { match (‘^’); match(ID); } else if (lookahead == ARRAY) { match (ARRAY); match(‘[‘); simple(); match (‘]’); match (OF); type(); } else error(); } – Example: <type>-><simple> | ^ id | array [<simple>] of <type> <simple> ->integer | char | num dotdot num void simple() { if (lookahead == INTEGER) match (INTEGER); else if (lookahead == CHAR) match (CHAR); else if (lookahead == NUM) { match(NUM); match(DOTDOT); match(NUM); } else error(); } void match(token t) { if (lookahead == t) {lookahead = nexttoken();} else error(); } – Recursive descent parsing may require backtracking of the input string • try out all productions, backtrack if necessary. • E.g S->cAd, A->ab | a • input string cad – A special case of recursive-descent parser that needs no backtracking is called a predictive parser. • Look at the input string, must predict the right production every time to avoid backtracking. • Needs to know what first symbols can be generated by the right side of a production only lookahead for one token) – First(a) - the set of tokens that can appear as the first symbols of one or more strings generated from a. If a is empty string or can generate empty string, then empty string is also in First(a). – Given productions A ->a | b, predictive (by looking at 1 token ahead) parsing requires First(a) and First(b) to be disjoint. – Predictive parsing won’t work on some type of grammars: • Left recursion: A->Aw (expanding A results in an infinite loop). • Have common left factor: A->aB | aC (First(aB) and First(aC) is not disjoint). – Eliminating Left Recursion • Immediate Left Recursion – Replace A->Aa | b with A->bA’ and A’->aA’ | e – Example: E->E+T | T T->T*F | F F->(E) | id – In general, A A1 | A 2 | ... | A m | 1 | 2 | ... | n Can be replaced by A 1 A' | 2A' | ... | n A' A' 1 A' | 2 A' | ... | mA' | • Algorithm 4.1. Eliminating left recursion: Arrange the nonterminals in some order A1, A2, …, An for i = 1 to n do begin for j = 1 to I-1 do begin expand production of the form Ai ->Aj w end for eliminate the immediate left recursion among Ai productions. End for (the algorithm can fail if the grammar has a cycle (A==> A), or A->e) Example 1: S->Aa | b A->Ac | Sd | e Example 2: X->YZ | a Y->ZX |Xb Z->XY | ZZ | a – Left factoring (to produce a grammar suitable for predictive parsing) • replace productions A 1 | 2 | ... | n | 1 | ... | m by A A' | 1 | ... | m A' 1 | ... | n Example: S->iEtS | iEtSeS|a E->b