COP4020 Programming Languages Parsing Prof. Xin Yuan Overview Top-down and bottom-up parsing Recursive descent parsing Table driven LL(1) parsing 5/29/2016 COP4020 Spring 2014 2 Parsing: The process to determine whether the start symbol can derive the program. If successful, the program is a valid program. If failed, the program is invalid. Two approaches in general. Expanding from the start symbol to the whole program (top down) Reduction from the whole program to start symbol (bottom up). <expression> <expression> <operator> <expression> <expression> <operator> <expression> identifier * identifier + identifier Top-down parsing build the parse tree from root to leave (using leftmost derivation, why?). Recursive descent parser LL parser First L – left to right scan Second L – left most derivation Recursive descent parsing associates a procedure with each nonterminal in the grammar, it may require backtracking of the input string. Example: <type>-><simple> | ^ id | array [<simple>] of <type> <simple> ->integer | char | num dotdot num void type() { if (lookahead == INTEGER || lookahead == CHAR || lookahead==NUM) simple(); else if (lookahead == ‘^’) { match (‘^’); match(ID); } else if (lookahead == ARRAY) { match (ARRAY); match(‘[‘); simple(); match (‘]’); match (OF); type(); } else error(); } Example: <type>-><simple> | ^ id | array [<simple>] of <type> <simple> ->integer | char | num dotdot num void simple() { if (lookahead == INTEGER) match (INTEGER); else if (lookahead == CHAR) match (CHAR); else if (lookahead == NUM) { match(NUM); match(DOTDOT); match(NUM); } else error(); } void match(token t) { if (lookahead == t) {lookahead = nexttoken();} else error(); } Recursive descent parsing may require backtracking of the input string try out all productions, backtrack if necessary. E.g S->cAd, A->ab | a input string cad A special case of recursive-descent parser that needs no backtracking is called a predictive parser. Look at the input string, must predict the right production every time to avoid backtracking – LL(1) Needs to know what first symbols can be generated by the right side of a production only lookahead for one token Non recursive predictive parsing (table driven LL(1) parsing) Predictive parser can be implemented by recursive-descent parsing Requirement: by looking at the first terminal symbol that a nonterminal symbol can derive, we should be able to choose the correct production to expand the nonterminal symbol. If the requirement is met, the parser easily be implemented using a non-recursive scheme by building a parsing table. A parsing table example (1) E->TE’ (2) E’->+TE’ (3) E’-> (4) T->FT’ (5) T’->*FT’ (6) T’-> (7) F->(E) (8) F->id E E’ T T’ F id (1) + * ( (1) (2) (4) $ (3) (3) (6) (6) (4) (6) (8) ) (5) (7) Using the parsing table, the predictive parsing program works like this: A stack of grammar symbols ($ on the bottom) A string of input tokens ($ at the end) A parsing table, M[NT, T] of productions Algorithm: put ‘$ Start’ on the stack ($ is the end of input string). 1) if top == input == $ then accept 2) if top == input then pop top of the stack; advance to next input symbol; goto 1; 3) If top is nonterminal if M[top, input] is a production then replace top with the production; goto 1 else error 4) else error Example: (1) E->TE’ (2) E’->+TE’ (3) E’-> (4) T->FT’ (5) T’->*FT’ (6) T’-> (7) F->(E) (8) F->id Stack $E $E’T $E’T’F $E’T’id $E’T’ …... E E’ T T’ F id (1) + * ( (1) (2) (4) input id+id*id$ id+id*id$ id+id*id$ id+id*id$ +id*id$ $ (3) (3) (6) (6) (4) (6) (8) ) (5) (7) production E->TE’ T->FT’ F->id This produces leftmost derivation: E=>TE’=>FT’E’=>idT’E’=>….=>id+id*id